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A METHOD FOR SEARCHING HETEROGENEOUS 
COMPOUND DATABASES USING TOPOMERIC SHAPE 
DESCRIPTORS AND PHARMACOPHORIC FEATURES 

A portion of the disclosure of this patent document contains material which is subject to 
copyright protection. The copyright owner has no objection to the facsimile reproduction by 
anyone of the patent document or the patent disclosure, as it appears in the Patent and 
Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. 

BACKGROUND OF THE INVENTION 

Field of the Invention: 

This invention relates generally to the field of pharmaceutical research and to the three 
dimensional searching of structures of chemical compounds to identify compounds which may 
share a biological activity with a known compound. In particular the invention concerns a 
method for searching databases of commercially available compounds which may or may not 
share any common synthetic linage. 
Description of Related Art: 

The advent of high throughput screening of chemical compounds for biological activity 
has dramatically changed the paradigm of pharmaceutical research in recent years. Coupled with 
combinatorial synthesis, it is now possible to test millions of compounds on an efficient basis. 
However, the cost per hit of such searching remains extremely high given the enormous number 
of compounds which can be tested and the typically low "hit" rates which are achieved. As a 
result, greater emphasis has been placed on the testing of compound libraries which are believed 
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to contain a higher percentage of potentially relevant molecules. The skills of computational 
chemists have been employed to design such compound libraries for testing. 

Two type of libraries were considered possible: first, a library which explored the 
diversity of structures in chemical space across the range of compounds which could be 
synthesized without oversampling the same area of diversity space (redundant testing); and 
second, a library in which the compounds would be likely to have the same biological activity 
as a known molecule or drug. The major problem confronting computational chemists in the 
selection of compounds for such libraries was how to characterize the compounds in a manner 
which would permit the desired selections. Bioscientists have long known that the three 
dimensional shape of a compound which acts as a ligand to a larger biomolecule must be 
complimentary to the shape of the binding site of the larger biomolecule. In studying the 
relationships between the chemical structure of a molecule and its biological activity (structure 
activity relationships (SAR) many techniques to characterize the three dimensional shape of 
molecules were devised. One of the most successful of the techniques for generating a 
quantitative structure activity relationship (QSAR) characterized the shape of molecules by 
defining an interaction energy field between a probe molecule and each part of the studied 
molecule in a three dimensional grid surrounding the molecule. The shape data thus generated 
for a series of molecules could be correlated with the biological activity of the molecules to 
produce the QSAR. This technique by Cramer and Wold (Comparative Molecular Field Analysis 
[CoMFA]) is described in detail in U.S. Patent No.5,025,388 and U.S. Patent No. 5,307,287. 
Use of the CoMFA approach required detailed considerations of two major factors: 1) 
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the proper alignment of the test molecules; and 2) the conformation or conformations of the 
molecules which had to be taken into account. In addition, the technique worked only with 
molecules sharing the same biological activity. However, the technique clearly demonstrated the 
power of utilizing three dimensional shape descriptors in molecular analysis. 

Over time many three dimensional shape descriptors and methods of library selection 
were attempted by computational chemists. U.S. Patent No. 5,703,792 to Chapman describes 
one such approach. Two major problems confronted the field and cast doubt on the generality 
or accuracy of all the methods which had been devised. The first problem was that no one could 
show that the molecular structural descriptors which had been used were generally valid; that 
is, that the descriptors described molecules in a manner which correlated with biological activity 
across a range of biological systems. Any descriptor which would be used to select compounds 
for libraries would have to be valid irrespective of the biological activity which might be tested 
against the library. The second problem was that there was likewise no way to demonstrate that 
the methods of handling multiple conformations in the prior art methods were either accurate 
or applicable across all types of molecules. 

The solution to these problems by Cramer, Patterson, Clark, and Ferguson are taught in 
U.S. Patent No. 6,185,506. The validity of a molecular structural descriptor can be 
demonstrated across multiple biological activities by employing the Patterson plot methodology 
described in the patent. Both two and three dimensional descriptors can be evaluated by the 
methodology, and, in principal, there is no limitation on the dimensionality of the descriptors 
which can be evaluated. Using the validation technique, valid descriptors were identified which 
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could be used with assurance to design libraries having desired properties. By this method the 
two dimensional prior art fingerprint Tanimoto descriptor was shown to be valid as well as a 
new three dimensional descriptor described below. The validation methodology also identified 
a neighborhood distance characteristic of the descriptors which could be used in the design of 
the libraries. In addition, the neighborhood distance led directly to methods for searching the 
libraries, and, once a molecule had shown activity in a screen, for expanding the search for 
other molecules having the same activity. 

Further, a solution to the problem of identifying a generally appropriate molecular 
conformation or conformations to take into account was taught. An alignment rule for molecular 
parts (topomeric alignment) is demonstrated which generates a uniform orientation. The shape 
of the molecular part is characterized, as in CoMFA, by a field of interaction energies calculated 
between a probe and the atoms in the aligned molecular part at each point in a three dimensional 
grid surrounding the molecular part. The steric interaction energies are principally used 
although, in the appropriate circumstances, electrostatic interaction energies may be added. 
Although the alignment may be arbitrary and unlikely for any particular molecule, the field 
shape descriptor of the topomeric alignments was shown to be a valid molecular structural 
descriptor by means of the Patterson plot method. 

Using descriptors having an associated neighborhood distance, molecules could be 
identified which shared shape characteristics in a way which was meaningfully related to their 
biological activity. The problems of efficient library design and selection of combinatorially 
accessible molecules could be further addressed. In U.S. Patent Application No. 08/903,217, 
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presently allowed, the construction and searching of a virtual library is described. The virtual 
library contains validated molecular structural descriptions of each component part which could 
be used in a specified combinatorial synthesis. All possible product molecules which could be 
combinatorially derived from the component parts can be searched, without the necessity of 
generating the product structures during the search, for product molecules having desired 
properties by searching through only a combination of the descriptors of the component parts 
of the product molecules. In the preferred embodiment the Tanimoto and the three dimensional 
topomeric CoMFA descriptors are employed. 

Due to the combinatorial nature of the number of product molecules whose characteristics 
can be determined, a relatively small number of structural variations (tens of thousands), cores, 
and synthetic schemes employing only two attachment points can yield a searchable library of 
billions of possible molecules according to the method of the patent. Indeed, the number of 
searchable molecules outnumbers the number of molecules ever reported by several orders of 
magnitude. By the techniques disclosed in the patent, this virtual library can be searched very 
fast to construct diverse libraries of molecules likely to share the same biological activity or to 
find molecules which share the same biological activity as a combinatorially derived query 
molecule. Further, query molecules which derive from unknown synthetic routes can be 
fragmented and the molecular descriptor characterization of the fragments used to search for 
similarly shaped fragments and potential molecules with likely similar biological activity defined 
in the virtual library. In practice the topomeric field molecular structural descriptor has proven 
to be very valuable in searching the virtual library. The powerful and fast searching capabilities 
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of the virtual library method have yielded significant advances. 

However, the molecules in the virtual library which can be searched by definition derive 
from a combinatorial assembly of a relatively few number of constituent parts and can be said 
to be homogeneous in that sense. By virtue of the exceedingly large size of the virtual library, 
5 molecules may be identified which are not readily available. Also, although the possible product 
molecules which can be searched are the result of known combinatorial synthetic schemes, the 
actual synthesis may not be easily achieved. In the day to day world of pharmaceutical research, 
large assemblages of available molecules can be commercially obtained. These assemblages are 
not the result of any particular combinatorial synthesis but rather represent the assembly of a 
IQn wide range of molecules from many different sources and syntheses, some known, some 
yfl unknown. Therefore, these assemblages of molecules can be characterized as heterogeneous. 
^ It would be useful if heterogeneous assemblages of available molecules could be searched 

J for molecules which are likely to have a biological activity similar to a known compound before 
p synthesis of new compounds is undertaken with the concomitant additional time and expense. 
133 BRIEF SUMMARY OF THE INVENTION 

O Databases which contain the structures of a heterogenous assembly of available molecules 

can be searched for molecules having a biological activity similar to a known compound. Each 
molecule specified by the database is split into several fragments according to defined rules and 
the shape of those fragments is compared to the shape of the fragments generated from a query 
20 molecule using the topomeric field molecular structural descriptor. The molecules having the 
closest matching shapes to the query molecule are selected for further testing. 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 shows a number of possible ways to fragment a molecule into two pieces in 
accordance with the fragmentation rule. 

Figure 2 shows a number of possible ways to fragment a molecule into three pieces in 
5 accordance with the fragmentation rule. 

DETAILED DESCRIPTION OF THE INVENTION 
Computational Environment: 

Generally, all calculations and analyses to perform the method of the disclosed invention 
are implemented in a modern computational chemistry environment using software designed to 
1(H handle molecular structures and associated properties and operations. For purposes of this 
00 Application, such an environment is specifically referenced. In particular, the computational 
Ul environment and capabilities of the SYBYL, UNITY, and CONCORD software programs 
!jr developed and/or marketed by Tripos, Inc. (St. Louis, Missouri) are specifically utilized. The 

K 

p software code to implement the method of the disclosed invention is set out in the Appendices 

p= 

1$3 to this Application. Software with similar functionalities to SYBYL, UNITY, and CONCORD are 

u available from other sources, both commercial and non-commercial, well known to those in the 
art. A general purpose programmable digital computer with ample amounts of memory and hard 
disk storage is required for the implementation of this invention. In performing the methods of 
this invention, representations of thousands of molecules and molecular structures as well as 
20 other data may need to be stored simultaneously in the random access memory of the computer 
or in rapidly available permanent storage. The inventors use Silicon Graphics, Inc. (SGI) 
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"R12000" computers having 350 - 400 MHz processors and between 256 Mb and 512 Mb of 
memory with 8 - 10 Gb hard drive storage disks. In addition SGI "Origin" or "02" or "O2100" 
computers can be used. Access to several gigabytes of storage and faster Silicon Graphics, Inc. 
processors is useful. 
Incorporation of Patent Disclosures: 

The disclosures of U.S. Patent 6,185,506 and of U.S. Patent Application No. 08/903,217 
are expressly and completely incorporated into this application as if fully set forth herein. 
Topomeric Alignment: 

As taught in the incorporated U.S. Patent and patent application, molecular fragments 
may be aligned following topologically-based rules to generate a single, consistent, 
unambiguous, aligned topomeric conformation. The procedure also takes full account of chiral 
atoms. All fragments which are to be compared in a search must be aligned with the same 
topomeric rules. In the present method such a topomeric alignment is used, the details of which 
are fully set out in the attached software code. 
Calculation Of Fields: 

The basic CoMFA methodology provides for the calculation of both steric and 
electrostatic fields. It has been found up to the present point in time that using only the steric 
fields yields a better molecular structural descriptor than a combination of steric and electrostatic 
fields. There appear to be three factors responsible for this observation. First is the fact that 
steric interactions - classical bioisosterism - are certainly the best defined and probably the most 
important of the selective non-covalent interactions responsible for biological activity. Second, 
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adding the electrostatic interaction energies may not add much more information since the 
differences in electrostatic fields are not independent of the differences in steric fields. Third, 
the addition of the electrostatic fields will halve the contribution of the steric field to the 
differences between one shape and another. This will dilute out the steric contribution and also 
dilute the neighborhood property. Clearly, reducing the importance of a primary descriptor is 
not a way to increase accuracy. However, it is certainly possible that in a given special situation 
the electrostatic contribution might contribute significantly to the overall "shape". Under these 
unique circumstances, it would be appropriate to also use the electrostatic interaction energies 
or other molecular characterizers, and such are considered within the scope of this disclosure. 
In particular, as will be discussed below, it has been found that the additional information 
typically associated with pharmacophore mapping can be utilized to further characterize the 
similarity between topomerically aligned molecular fragments. 

The steric fields of the topomerically aligned molecular fragments are generated almost 
exactly as in a standard CoMFA analysis using an sp 3 carbon atom as the probe. In standard 
CoMFA, both the grid spacing and the size of the lattice space for which data points are 
calculated will depend on the size of the molecule and the resolution desired. Typically, a 2 A 
grid spacing in employed both in CoMFA and in the heterogenous database searching method 
of the present disclosure. However the grid dimensions are varied in the present invention. For 
query molecules, the size of the grid is adjusted to encompass the smallest region that all of the 
query fragments will fit into. This significantly reduces the number of calculations that are 
necessary without reducing the ability of the descriptor to fully characterize the structures. This 
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modification will be discussed in more detail below. The steric fields are set at a cutoff value 
(maximum value) as in standard CoMFA for lattice points whose total steric interaction with any 
side-chain atom(s) is greater than the cutoff value. 

One difference from the usual CoMFA procedure is that atoms which are separated by 
one or more rotatable bonds are set to make reduced contributions to the overall steric field. An 
attenuation factor, preferably about 0.85, is applied to the steric field contributions which result 
from these atoms. For atoms at the end of a long molecule, the attenuation factor produces very 
small field contributions (ie: [0.85] N ) where N is the number of rotatable bonds. This attenuation 
factor is applied in recognition of the fact that the rotation of the atoms provides for a flexibility 
of the molecule which permits the parts of the molecule furthest away from the point of 
attachment to assume whatever orientation may be imposed by the unknown receptor. If such 
atoms were weighted equally, the contributions to the fields of the significant steric differences 
due to the more anchored atoms (whose disposition in the volume defined by the receptor site 
is most critical) would be overshadowed by the effects of these flexible atoms. 
Topomer Similarity: 

The notion of topomer similarity between a pair of molecules is defined as the "distance" 
represented by the difference between the molecular fields which serve to characterize the 
molecules' shapes. As an example, assume two molecules A and B which have each been 
placed in their topomeric alignment and the steric field values calculated for each point in the 
surrounding three dimensional grids. Let each grid point be denoted by its corresponding 
cartesian X, Y, Z coordinate so that for each molecule the grid points are defined as Xq, Y 0 , Z 0 
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X N , Y N , Z N . For each molecule A and B the field values, V A and V B , at each point in the 

grid are denoted as: 

V A X0 , V A TO , V A a V A XN) V A YN , V A ZN and V B X0 , V B Y0 , V\ V B XN , V B YN , V B ZN . 

The root sum square of distances between the fields is then defined as: 

This distance is conveniently denoted as: 

For identical molecular structures, the distance equals 0. Therefore, the closer the value of the 
distance is to zero, the closer in shape two molecules will be. When searching among many 
possible structures, the minimum calculated value of the distance is sought. 
Fragmentation: 

The following critical question which frequently occurs in chemical research, and 
especially in biological research, can now be addressed. The problem, as it is usually presented, 
takes the form: given an arbitrary query molecule (generally one previously found to exhibit a 
desired activity), find biologically similar molecules, that is molecules of similar 3D shape and 
activity. Generally, such a query molecule will not have resulted from a combinatorial synthesis, 
and, in fact, no knowledge of a possible synthetic route to the molecule may be available. In 
searching the virtual library of Application No. 08/903,217, the topomeric 3D shape data within 
the virtual libraries actually describe fragments (structural variations) of molecules. To find 
similarly shaped molecules within the virtual library, the query molecule must be fragmented 
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and the shapes of its fragments compared with the shapes of corresponding fragments (structural 
variations) in the virtual library. The difficulty is that a query molecule can be fragmented in 
so very many ways. The solution adopted for virtual library searching was a way to emphasize 
those fragmentations that are most likely to conform to efficient synthetic routes from available 
starting materials, without requiring the searcher of the virtual library to have any knowledge 
of what synthetic routes it includes. 

The solution employed a "fragmentation table", where each row constitutes a rule of the 
following sort: "for each occurrence of this particular structural feature combination (structural 
variation) in the query molecule, decompose the query molecule in a particular way specified 
in terms of this structural feature, and search only those combinatorial libraries that utilize 
specified reactions (sequences) and/or building blocks, mapping specified query fragments onto 
specified classes of building blocks". Each such query decomposition found generates a search 
of the virtual library, returning all those products whose sum of squares of differences in shape 
between corresponding product and query fragments is less than a user specified neighborhood 
distance threshold. Passing the query molecule (by means of a suitable computer program) 
against all the rows of this table generates all searches. 

The situation is much more complicated when a search of a database of heterogeneous 
compounds is desired. Not only is it necessary to fragment the query molecule, but each 
molecule in the database has to be likewise fragmented and comparisons made between the query 
fragments and the fragments arising from each molecule. Typically, anywhere from 2 to 50 
different fragments might be generated by fragmenting each molecule in the database. To 
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compare 6 fragments from a query molecule to an average of 20 fragments from each of 50,000 
molecules in a heterogeneous database would require 6 X 20 X 50,000 = 6,000,000 field 
comparisons. [Actually, as will be described below, because fragment pairs or triplets are 
involved, cross comparisons increase this number.] This is at least an order of magnitude greater 
than the typical 6 fragment query comparison to even 50,000 structural variations in the virtual 
library. In principal, a virtual library of every fragment occurring in all of the molecules in all 
examined heterogenous databases could be assembled, but the size of such a virtual library and 
the complexities of searching are not trivial. 

The method adopted for the present invention does not precalculate and store the metric 
characteristics of each fragment of each heterogenous database molecule. Rather, as each 
molecule is fragmented, the topomeric alignment and associated field is generated on-the-fly for 
each fragment and compared to the topomerically aligned field of a query molecule fragment. 
While the full fragmentation table scheme employed with the virtual library of Application No. 
08/903,217 may be employed, experience with fragmentations has shown that for medicinal type 
molecules the following fragmentation rule (which is a subset of the more general fragmentation 
method) produces meaningful fragments: 

"Break the molecule at acyclic bonds either singly or in pairs to generate sets of either 

2 or 3 fragments respectively where each fragment must contain greater than a user 

specified number of heavy atoms." 

Assuming a setting that every fragment must contain at least three heavy atoms, Figure 
1 shows an example of how the rule is applied in a typical molecule (either a query molecule 
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or a database molecule) to generate fragments. To generate the fragments, the whole structure 
is evaluated for each new fragmentation position. The two-piece fragmentations which will be 
performed are indicted by the thick lines. The two-piece fragmentations that will not be 
performed (because one of the resulting fragments contains less than three heavy atoms) are 
indicated by the thin lines. In this example, if, instead of requiring three heavy atoms, the user 
required five heavy atoms, then only the fragmentation between the two rings would be 
performed. 

An example of a three piece fragmentation is shown in Figure 2. Assuming again a 
setting that every fragment must contain at least three heavy atoms, the heavy lines indicate by 
arrows the two position in which the molecule would be fragmented into 3 fragments. The light 
lines indicate by arrows some of the three piece fragmentations that will not be performed 
because at least one of the fragments has fewer than three heavy atoms. If, instead of requiring 
three heavy atoms, the user required five heavy atoms, then no three-piece fragmentations would 
be performed. 

At the present time, it has been found that generating three fragments is necessary when 
a two fragment scheme does not yield significant results. The three fragment scheme seems to 
find similar shapes that are sometimes missed in two fragment analysis. However, due to the 
higher computational overhead of three fragment searching, searches are first performed at the 
two fragment level. Four fragment searches may be necessary for some types of molecules, but 
at the time of filing the present disclosure, such situations have not been identified. Clearly the 
searching method of the present invention is not limited to the number of fragments which are 
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generated but is generally applicable to as many fragments as the user wishes to consider. 
Topomeric 3D Searching: 

When analyzing molecules for shape similarity, it should be recognized that not all the 
elements of a molecule's shape may be required for proper interaction with a larger biomolecule. 
Perhaps in some instances, the entire shape is critical to the match. In other instances, only part 
of the molecule's shape may be critical to the match and other parts relatively unimportant. 
When comparing shapes of query molecules to those found in a heterogenous database, it is 
important to be able to compare not only the overall shape of the molecules, but also subparts. 
The method and software of the present invention permit many types of shape comparisons as 
will be discussed below. 

Different heterogenous databases of compounds store compound structures in different 
formats such as SMILES, SLN, or an MDL format. Many software programs are available for 
interconverting the structures from one format to another. For the present application, the 
inventors use UNITY to convert compound information to SLN (Sybyl Line Notation) format. 
Compound information is then transferred to the CONCORD software program. CONCORD 
generates the three dimensional structure of the molecule. The starting point for topomeric 
searching of compounds listed in a heterogenous database are the CONCORD generated three 
dimensional structures of the database molecules and the query molecule. These structures are 
provided as input to the software programs set forth in the Appendices to the present disclosure. 

The user specified fragmentation pattern (2 or 3 fragments and the number of included 
heavy atoms) is applied to the query molecule and the first database specified molecule. After 
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each set of shape comparisons, the next database specified molecule is taken up in order. After 
the fragmentation patterns have been identified for each molecule (query or database), each 
fragment is aligned according to the topomeric rules. 

In the preferred embodiment, the fragment is translated and placed into the grid so that 
the atom from which the "broken" acyclic bond extends into the fragment of interest is placed 
at the 0,0,0 coordinate. The "broken" bond (the attachment bond) is then directed along the X 
axis (standard topomer alignment) and the part of the molecule which is considered the fragment 
is aligned topomerically in the grided space. Alternatively, the atom in the fragment of interest 
which is connected to the acyclic bond which is "broken" is placed at the 0,0,0, position. This 
results in virtually insignificant differences in the topomer distances which are calculated. 

Another feature of the present method is that a variable size grid region is used. Since 
some fragments are small and others large, the same volume of three dimensional grid space is 
not required to contain each fragment. Nothing is gained by placing a small fragment in a large 
grid space and only results in calculating an unnecessary number of extra grid location 
interactions. For the query molecule, the grid is adjusted to encompass the smallest region in 
which all the query fragments will fit. For database molecule fragments, the initial database 
molecule grid is one unit larger in all dimensions that the grid determined for the query 
fragments. The grid size is expanded by one unit in each dimension until the accumulated sum 
of the grid intersection points (starting with the query grid size and adding all the intersection 
points contained in each expanded grid) is greater than 10,000 or the grid has been expanded 
from its initial size by 11 units in each dimension. This procedure is followed since most 
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computers, even those configured for molecular modeling, have a memory capacity which can 
be exceeded by allowing for unlimited grid size and number of intersection points. The grid size 
limitations are not required by the inherent method of the invention. Compression of the data 
from the thousands of data points in a large grid also aids in reducing the memory requirement 
for large grids. When a situation is encountered where the database molecular fragment extends 
outside of the maximum grid size, an "outside of the grid" factor is applied my multiplying the 
number of atoms outside the grid by the maximum interaction energy possible (typically 900) 
and adding that value as additional term in the root sum of squares similarity calculation. The 
use of dynamic grid sizing increases the throughput performance of the method considerably. 
Whole Molecule Two Piece Comparisons: 

As noted, for a two piece comparison both the query molecule and the database molecule 
are always split into just two pieces at each acyclic bond starting with the whole molecule each 
time. If there are 4 acyclic bonds and the heavy atom count matches the user selected value 
(default is typically = 4), four two fragment pairs will be generated. As an example of the shape 
comparison, consider a query molecule which can only be broken at one acyclic bond to form 
fragments A and B. Consider also that a database molecule can only be broken at one acyclic 
bond into fragments C and D. Among the four fragments, there are two sets of comparisons 
possible: A:C & B:D, and A:D & B:C. A first comparison is made between: A:C and B:D. [In 
the actual calculation the squared differences in the field values between each grid location in 
each fragment are kept and the square root is only taken at the end of the comparison process.] 
Thus for the A:C & B:D comparison, a distance is determined as: 
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J(A:C) l +(B:D) 2 

This value is retained for comparison. For the A:D & B:C comparison, a distance is determined 
as: 


This value is compared to the value determined for the first A:C & B:D set and the lower value 
(greater similarity) retained. Thus, there are two comparison for each pair of molecules. It has 
been found that generally one will be significantly more similar than the other. The lower (more 
similar) value is retained and compared to the values obtained for the query against every other 
molecule in the database. Ultimately, the molecules in the database which are most similarly 
shaped to the query molecule will be determined by those with the smallest field difference. 

As a further example consider a query molecule which can be broken at four acyclic 
bonds to form four two fragment pairs and a database molecule which can be broken at five 
acyclic bonds to form five two fragment pairs, this may be represented as: 


sl(A.D) 2 +(B:C) 2 
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Database 
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B 
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J 
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D 
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P 
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R 


The first comparison will beA:I&B:JandA:J&B:I.A second comparison will be A:K 
& B:L and A:L & B:K. Similar comparisons will be obtained between each query fragment pair 
and each database molecule fragment pair. Of all the comparisons, the one having the smallest 
difference in field value will be kept for further comparison to the values obtained for all the 
molecules in the database. These comparison are whole molecule comparison because each 
fragment of the query molecule is compared to each fragment of every database molecule in sets 
of two (representing a complete molecule). 

Whole Molecule Three Piece Comparisons: 

If a three piece fragmentation scheme is employed the same shape comparison principles 
apply but are further complicated by the presence of the central fragment. In two piece 
fragmentation, each fragment has only one attachment bond which may be placed at the 0,0,0, 
grid coordinate. There is, therefore, only one topomeric alignment for the fragment. However, 
the central fragment in a three piece fragmentation will have two attachment bonds one each at 
the points were the two side fragments have been severed. There will, therefore, be two starting 
points for the topomeric alignment which will result in a different topomer shape of the aligned 
fragment. Each of these shapes must be included in the comparison. 

As an example consider a query and a database molecule each which may be broken into 
three three piece fragmentations: 


Query 
A 
B 


Database 
J 
K 
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B' K' 
C L 

D M 

5 EN 
E' N' 
F O 

G P 
10 HQ 

H' Q' 
I R 


The primed fragments represent the second orientation of the central fragment of the 
15 three. Fields are calculated for all fragments as before. Considering just the first fragment set 
from both the query and database molecules the first set of distance comparisons are: A: J & B:K 
•J3 & B':K' & C:L and the distances is: 

W \I(A:J) 2 +(C:L) 2 +[(B:K) 2 '-.K) 2 ]/! 

S The last term takes the average contribution of the center piece. Similarly, the other possible 

s 

q comparisons are calculated as: 

fi yj(A±) 2 +(CJ) 2 +[(B:Kf +(B '-.Kfyi 

2(H From the two sets of comparisons, the one with the lower field difference (more similar) is 
retained for comparison. All the other comparisons between each three fragment set of the query 
and each three fragment set of the database molecule are calculated and the one with the lowest 
field difference is retained for comparison with those generated for all the other database 
molecules. 

25 One further complication which arises with three piece fragmentation is that it is 
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sometimes necessary to apply an attachment bond penalty to the calculated distance to reflect 
differences in the structure. Since there are two attachment bond points, the spatial relationship 
between those points will influence the shape of the whole molecule. However, considering just 
the fragments will not totally reflect the shape characteristics specified by the spatial relationship 
of the attachment points. This is an attempt to preserve the three dimensional structure of the 
whole molecule. A penalty value is thus added to the shape differences (increasing the apparent 
difference or similarity) to compensate. The penalty value is calculated as: 

This penalty value is multiplied by an arbitrary factor depending on the user's belief in the 
significance of the structural difference. The penalty is initially set at 10 in the code but might 
be set as high as 100. For instance, as an example consider the ortho, meta, and para positional 
attachment bonds on a ring. The overall molecular shape will vary significantly if two side 
chains are in the ortho versus the para position with respect to each other. Accordingly, for the 
1 atom difference of an ortho relationship, a penalty of 10 would be applied; for the 2 atom 
difference of a meta relationship, a 20 unit penalty would be applied; and for the 3 atom 
difference of a para relationship, a penalty of 30 would be applied. The point is that in 
determining the shape comparisons, a substituent can not just be moved around the ring and have 
it match without some penalty to reflect the difference in position. 

For large molecules small changes in the number of atoms in the molecule is less likely 
to effect the overall shape than for small molecules. For effective shape comparisons, large 
structures need to be less sensitive to steric difference while small structures need to be more 
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sensitive to steric differences. Experience has shown that there is a pivot point around 25 heavy 
atoms with structures considered large with more than 25 heavy atoms. Increasing the weighting 
of the steric contributions for small structures and decreasing it for larger structures has been 
found with experimental data sets to cut the number of false positives in half for small structures 
and allow more hits for large structures without eliminating many small structure hits. 
Accordingly, for structures having more than 25 heavy atoms the steric field values calculated 
for each point in the grid may be decreased by as much as 33% (field values multiplied by 
0.67). For structures having fewer than 25 heavy atoms the steric field values calculated for each 
point in the grid may be increased by as much as 100% (field values multiplied by 2.0). A non- 
linear multiple seems to work best. 

In addition to using a variable grid size, another observation leads to a method of 
increasing the effectiveness and throughput of the searching methodology. It has been observed 
that for molecules which have a size difference of over +/- 12 heavy atoms, there is little 
likelihood of finding molecules which match in shape. Consider a query with 20 heavy atoms 
and a database molecule with 33 heavy atoms. Since to start with there will be 13 atoms in the 
database molecule which will not be matched in the query, a large distance (dissimilarity) will 
already be found due to the missing atoms. The likelihood that all of the remaining atoms will 
lie in equivalent positions so that only the missing atoms will contribute to the difference in field 
values (and hence in similarity) is vanishingly small. Experimental runs on known data sets bears 
out this observation. Before any fragmentation is done, the difference in heavy atom size of the 
query and database compound is determined, and, if the difference is greater than 12 heavy 
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atoms, the comparison is skipped. 
Subset Searching: 

As noted above, only part of the shape characteristic of many molecules may be 

responsible for the binding of those molecules to larger biomolecules. Accordingly, a search is 

desired which would find whether any part of the query molecule has the same shape as any part 

of the database molecule. This can be thought of as a partial fragment match. The method of this 

invention directly permits this type of search to be conducted. The query molecule is fragmented 

into two parts and the database molecule is fragmented into three parts in as many different ways 

as possible. For each possible three piece fragmentation you get: 

Query Database 

E A 
F B 
C 

In order to determine whether any part of the database molecule matches any part of the query 

the following comparisons are done: 

E : A E : B 

F : B F : C 

F : A F : B 

E : B E : C 

Since you are interested in locating any part of the database molecule which is closely similar 

in shape to all parts of the query molecule, the difference in heavy atom count exclusion which 

is applied to whole molecule searching is modified for subset matching. Instead of excluding the 

search if there is a +/- 12 heavy atom difference, for subset searching the exclusion is not 
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applied unless there is a +/- 30 heavy atom difference. 
Core Searching: 

In some instances it is desirable to find another core of similar shape to a known core 

upon which a series of molecules may be built. For instance, suppose a patented series of 

compounds can be recognized as built upon a particular core. If that core can be replaced with 

a similarly shaped but chemically different core, it may be possible to construct an entirely new 

series of compounds active at the same site without infringing the patented series. To conduct 

this type of search the core and its two attachment bonds needs to be specified. How the 

searcher decides on the core structure is up to the searcher. The core is aligned in its two 

possible topomeric orientations and the fields calculated. The topomerically aligned field of only 

the central fragment of all possible three piece fragmentations of the database molecules are 

compared to the core fields as A:C & A':C: 

Query Database 

A B 
A' C 
D 

Again, as before in the case of three fragment searching which involves a central 
fragment with two attachment positions, attachment penalties can be assigned to better 
characterize/distinguish the overall molecular shape based on where the attachment bonds are 
placed with respect to each other on the query core structure. For core searching, the penalty 
multiplier is typically set at 50. The molecules identified in the database which have central 
fragments generating the smallest values (greatest similarity) in the comparison to the specified 
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core would be examined for possible use as cores. 
Features: 

As noted earlier, there may be some circumstances where the electrostatic field may be 
used in addition to the steric field to characterize the shape of a topomerically aligned fragment. 
A much more useful characterization has been implemented which extends ideas from 
pharmacophore modeling for use in searching heterogenous databases of compounds. It is well 
recognized that certain characteristic interactions of molecules in addition to shape play an 
important role in determining whether that molecule will bind to a larger biomolecule. 
Complimentarity of shape permits the molecules to approach each other closely enough for these 
interactions to take place. In pharmacophore modeling the presence and location of feature 
classes containing molecular characteristics thought important to the binding of the molecule is 
tracked as well as the distances and directions between the features. An absence of any given 
feature in a molecule or a different location is considered to significantly reduce the likelihood 
of that molecule's binding and, thus, typical pharmacophore modeling is an all or nothing 
proposition. Clearly, in the present methodology due to the topomeric alignment of fragments 
all distance and direction attributes of features present in the fragments are lost. 

However, an alternative approach to incorporating the characteristic interactions in 
conjunction with the shape similarity matching described above has proven to generate an 
exceedingly powerful and accurate discovery methodology. The classic five feature classes are 
employed: positive charge, negative charge, hydrogen-bond-donating, hydrogen-bond-accepting, 
and aromatic. When present in either the query molecule or the database molecule, the features 
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are assigned X,Y,Z point locations in the topomer alignment either centered on the relevant 
atom, or, in the case of aromatic rings, the centroid of the ring is specified. Generating the 
topomer conformation of a molecular fragment not only fixes the steric shape of that fragment, 
but is also fixes the Cartesian coordinates of each pharmacophore feature contained within the 
fragment. The search strategy can be summarized as finding all the database molecule fragments 
which have features, similarly located in topomer space and similar in any other detailed feature 
property, that match each of the features in the topomerized fragments of the query structure. 

In keeping with the distance definitions used for steric shape similarity, differences in 
features are defined with the same dimensionality as shape so that both shape and features can 
be used to characterize a fragment for searching. Feature by feature differences are also 
combined in a root sum square rather than a straight sum fashion. Thus, a second feature 
mismatch would not be as costly as the first one. To determine the feature "distance", each of 
the pharmacophore features in the query structure is considered in turn, by identifying the 
closest feature of the same pharmacophore class in the database molecule fragment. If there is 
no such feature or if the nearest such feature is more than 1.5 A distant, the dissimilarity sum 
of squares is increased by a maximum of 100X100 units. (Units are chosen to be commensurate 
with the steric shape units of kcal/mole- Angstrom 3 .) If there is a matching feature within 0.5 A, 
the dissimilarity is set to zero. For a feature separation between 0.5 A and 1.5 A the 
dissimilarity penalty increment is obtained by linear interpolation between 0 and 100X100 unit 
values. Further, it is possible to scale/ weight the feature contribution to increase or decrease its 
relative contribution with respect to the steric contribution to the observed similarity (distance). 
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Note that the use of the term "distance" with the feature searching methodology of the present 

invention is not meant to refer to an actual physical "distance" as considered in traditional 

pharmacophore techniques. For a two piece fragmentation the distance (similarity) between 

fragments is calculated as: 

Query Database 

A C 
B D 

/ 2 : 2 2 2 

\J(A:C) FEATURES + * ^STERIC + (^ : ^)fEATURES + (^ : ^)sTERIC 

The cross terms for the A:D and B:C comparisons follow a similar definition as earlier. It has 
been observed that if the value of: 

: Q FEATURES + FEATURES 

is too high, the distance will be large (little similarity) and the full calculation including the time 
consuming calculation of steric field can be skipped. This also increases the effectiveness and 
throughput of the method. 

While the relative weight of each feature's contribution to the field can be varied, in the 
basic method, an attempt is made to match all features in a query with the nearest feature of the 
same class in the database molecule. This is similar to a pharmacophore type match, but there 
is no concern with matching interfeature distances in the topomeric conformation. Further, 
unlike standard pharmacophore searching, the user is able to assign adjustable penalties in the 
event that an exact match is not possible. For instance, a nearby spatial match of one type of 

R. CRAMER, JILEK, LIU, GUESSREGEN, WENDT, AND K. CRAMER 

27 


feature might be more acceptable to the user than a nearby spatial match of another feature. The 
distance penalty for the spatially mismatched first feature could be set much lower than for a 
spatially mismatched of the second feature. The features method also permits handling of 
situations where a feature is present in a database molecule but not in the query molecule. In 
standard pharmacophore technique, this situation would lead to a total mismatch. However, in 
the present method the user can assign a distance (similarity) penalty for the absence of the 
match to the query, but need not totally ignore either the overall shape of the query or the 
contribution of the other features in judging the similarity of the structures. 
Partial Feature Matching: 

It is recognized that very frequently the binding of small molecules to receptors is highly 
dependant on the interaction between hydrogen-bond-donating and hydrogen-bond-accepting 
atoms. For partial feature matching, the search for charged groups and aromatic rings may be 
turned off. A large penalty (10,000 units) is applied for donors and acceptors which do not 
align. In addition, the number of donor or acceptor matches required can be varied. This 
capability is included since it is recognized that frequently only 2 or 3 groups are required to 
make a small molecule active. For partial feature matching, all the hydrogen-bond-donating and 
hydrogen-bond-accepting features are examined but only those generating the lowest 2 or 3 
distances (including applicable penalties) across all (A:C, A:D, B:C, & B:D) the fragment 
comparisons for the compounds are used. 

A further variation of the partial feature matching method considers the situation where 
the user determines that there is only one feature which is most important to match. If that 
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feature is present and properly located, there is no penalty, the field differences are zero and the 
similarity is great. The flip side of single feature matching is that if the feature doesn't match 
a very large penalty is imposed to clearly yield a large difference (greater distance and low 
similarity). 

Feature matching has been found to greatly increase the effectiveness of the heterogenous 
database searching since it compliments the shape specific searching. Use of both steric shape 
searching and feature searching of a topomerically aligned fragments has been found to be as 
good as or better than any equivalent 2D searching with fingerprints which has been, until now, 
the gold standard of searching technologies. In addition, the results of shape and feature 
similarity searching yields actual molecular structures which chemists recognize as being 
members of the same class of compounds. Also, unlike fragment searching, molecular structures 
are clearly identified which can serve as bases for continued development. 

The method of the present invention for the first time permits the three dimensional 
searching of a heterogenous compound database for compounds that are likely to have the same 
biological activity as a query molecule. The results identify molecular structures having similar 
shape properties, and, when used with features, similar pharmacophore properties. The 
identification of the structural fragments which contribute to the identified similarity provide an 
insight into the shape requirements of the receptor, and just as importantly, into likely additional 
molecular structures and corresponding shapes which will likely share the same activity. Thus, 
lead development is more straight forward from a knowledge of the relevant shape characteristics 
of the fragments provided by the method of this patent disclosure than from any two dimensional 
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searching technique. 
Output: 

The most commonly used output reports the single best match between the query 
molecule and all molecules in the heterogenous database. The two or three piece fragment which 
was responsible for the match is also reported. A variation of the output, displays the fragment 
of the best hits and the query fragment that it matches. Once can also ask the system to list all 
hits with field differences less than some value; in other words a list of the most similar 
molecules. 

The software code written in the C language contained in the Appendices implements all 
the capacities of the present invention. The CT_TOP.C code provides all the calculation 
functionalities. DBTOP.C contains the command line interface, the user inputs, code to read the 
input structures, calls to the CT_TOP.C routines, and output interface. CT_TOP.H lists all the 
required data structures used. The code needs to be compiled by a standard C compiler before 
being run as is well understood in the art. All together, all code necessary to fully disclose an 
enabling embodiment of the invention in the computational chemistry environment specified 
earlier is set forth in the Appendices. 

From the proceeding description of the construction, generation, and searching of a 
heterogeneous database of molecules, it should be clear that there are many variations which 
may be employed and, having taught how to generate and search one specific embodiment, all 
equivalent embodiments are considered within the scope of this disclosure. 

While the preceding written description is provided as an aid in understanding, it should 
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be understood that the source code listings appended to this application constitute a complete 
disclosure of the best mode currently known to the inventors of the methods of heterogeneous 
database searching. 

Thus, while this invention has been particularly described with reference to the drug lead 
identification art, it is clear that the validation of molecular structural descriptors and their use 
in selecting structurally diverse sets of chemical compounds can be applied anywhere a large 
number of compounds is encountered from which a representative subset is desired. Since the 
implications and advances in the art provided by the methods of this invention are still so new, 
the entire range of possible uses for the methods of this invention can not be fully described at 
the present time. However, such as yet identified uses are considered to fall under the teachings 
and claims of this invention if validated molecular structural descriptors are employed to 
characterize the diversity of molecules. 
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^include <stdio.h> 
^include <stdlib.h> 
#include <malloc.h> 
^include <ctype.h> 
^include <time.h> 
^include < memory. h> 
^include "ct.h" 
#include H ct_proto.h" 
^include "import_proto.h" 
^include "utl_mem.h" 
^include "utl_scan.IT 
include "utl_set.h" 
^include "comfa.h" 
#include "parseopt.h" 
^include "ctjop.h" 


/* Option variables */ 

static char *hitlist; 

static char *UnityDatabase; 

static char *UnitySetName; 

static char *QueryFileName; 

static char *queryDetailFileName; 

static double radius = 120.0; 

static int min_atoms = -1; 

static int AllowTerminal Atoms = -1; 

static double reductionFactor = 0.85; 

static double attachmentFactor = -1.0; 

static double max attachpen = 100.0; /* 2x attachmentFactor - about 2 angstroms */ 

static double featureFactor = 1.0; 

static double extraFeaturePenalty = 0.1; 

static int stericPivot = 30; 

static int partialMatch = 0; 

static int useFallback = 1 ; 

static int do2piece = 1 ; 

static int do3piece = 1 ; 

static int doSubset = 0; /* query 2 piece, with structure 3 piece */ 

static int minHevSubset = -1; /* -1 means to auto adjust, 4 hev atoms less than query */ 

static int minHev = -1; 

static int maxHev = -1; 

static int hevDiff = -2; 

static int normalize = -1; 

static int max hits = 0; 

static int useFeatureCharges = 1; 

static char *str_feature Weights; 

static char *OutputFileName; 

static char *report_modes[] = { "tsv", "tsvd", "regid", "sin", "detail", "core", "matrix", (char *) 0 } 
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static char *region_modes[] = { "normal", "big", "huge", (char *) 0 }; 

static char *feature_modes[] = { "unitypref", "unity", "topomer", (char *) 0 }; 

static FeatureSetName featureSet = UseTopomerFeatures; 

static int report Warnings = 1 ; 

static int regionMode; 

static double stepSize = 2.0; 

static int debugLevel = 2; 

static char *debugFileName; 

static int res_alloc; 

static double *parseFeatureWeights(char *sptr ); 

int token_string(char *str, char token, int maxtoks, int skipMult, char **tokens ); 
static int DoCoreSearching( struct CtConnectionTable *qct, FILE *infp, FILE *outfp ); 
static int TriposSponge(int cnt); 
static double getLoad(char *line); 

typedef enum 
{ 

ReportTSV, 
ReportTSVD, 
ReportRegid, 
ReportSln, 
ReportDetail, 
ReportCore, 
ReportMatrix, 
ReportBrief, 
ReportStats, 
} ReportMode; 


static ReportMode rmode; 
/* 

WARNING: If you add or subtract options before -report adjust REPORTOFFSET accordingly. 

*/ 

#define FEATURESETOFFSET 15 
#define REPORT OFFSET 28 

static struct ParseOptions Options [] = { 

{ "hitlist", ParseOptString, &hitlist, 

"Name of a sin hitlist containing structures to search with 3D coordinates." }, 
{ "database", ParseOptString, &Unity Database, 

"Name of a Sybyl/3DB database\n\tWithout -database or -hitlist stdin is used." }, 
{ "use_subset", ParseOptString, &UnitySetName, 

"Name of selection set to use vs entire database." }, 
{ "query", ParseOptString, &QueryFileName, 

"Name of a file containing the query 
structure.^ \nField Options\n"}, 
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{ "distance", ParseOpt Double, &radius, 

"maximum shape units distance to report as a hit, default is 120." }, 
{ "stericpivot", ParseOptlnt, &stericPivot, 

"autoscale steric pivot point. Queries having fewer than N heavy atoms are more 
5 sensative to steric differences, \n\t\t0 is disabled. Default 30." }, 
{ "partialmatch", ParseOptlnt, &partialMatch, 

"donor and acceptor partial match. The lowest N HBD/HBA feature penalties contribute 
to the distance, \n\t\t0 is disabled. Default is 0" }, 
{ "minatoms", ParseOptlnt, &min_atoms, 
10 "minimum number of HEV atoms per fragment, default is 4. (a negative value sets the 

minimum number of 2piece splits" }, 

{ "terminal", ParseOptBoolean, & Alio wTerminal Atoms, 

"Use + terminal to enable the counting of terminal atoms, default -terminal " } 
{ "hevdiff", ParseOptlnt, &hevDiff, 
15 "Maximum allowed heavy atom count difference to compare compounds, \n\t\tdefault 12 

inclusive, 30 with +subset, -1 means disabled." }, 
{ "hevmin", ParseOptlnt, &minHev, 

"Minimum number of heavy atoms required in structure to search. Default 10\n" }, 
{ "hev max", ParseOptlnt, &maxHev, 
2( L, "Maximum number of heavy atoms allowed in structure to search. Default 80\n" }, 

y { "attach", ParseOptDouble, &attachmentFactor, 

"attachment penalty factor for 3 piece comparisons, default 10.0, 50 for core mode" }, 
K { "max attach", ParseOptDouble, &max_attachpen, 

"maximum attachment penalty for core searching -report core, default 100.0 " }, 
25^ { "feature", ParseOptDouble, &featureFactor, 

^ "Feature scaling factor, default 1.0" }, 

g { "usefeatureset", ParseOptEnum, feature modes, 

;~ "Default is topomer" }, 

q { "charge", ParseOptBoolean, &useFeatureCharges, 

30 j| "use -charge to disable charge group features, they have a high default penalty " }, 

q { "weight", ParseOptString, &str_feature Weights, 

fij "Comma seperated list of 5 feature weights, aromatic, pos charge groups, neg, HBA 

q HBD, \n\t\tdefault 20,200,200,100,100 " }, 
^ { "extra", ParseOptDouble, &extraFeaturePenalty, 

35 "Extra feature penalty factor applied to feature weight, default 0.1 " }, 

{ "arom", ParseOptBoolean, &normalize, 

"Default is false for database, true otherwise -arom disables +arom enables " }, 
{ "agscale", ParseOptDouble, &reductionFactor, 

"Aggregate scaling factor for rotatable bonds, default 0.85." }, 
40 { "2piece", ParseOptBoolean, &do2piece, 

"Use -2piece to disable 2 piece comparisons." }, 
{ "3piece", ParseOptBoolean, &do3piece, 

"use -3piece to disable 3 piece comparisons." }, 
{ "subset", ParseOptBoolean, &doSubset, 
45 " use + subset to enable subset searching. \n\t\tQuery is allowed to hit larger structure 

containing a portion of the 2 piece fragmentation." }, 
{ "stepsize", ParseOptDouble, &stepSize, 

"Step size of the grid points, default 2.0, lower values take longer" }, 
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{ "fallback", ParseOptBoolean, &useFallback, 

"Use -fallback to disable using smaller minimum atoms when no splitting 

occurs. \n \nOutput Options\n" }, 

{ "besthits", ParseOptlnt, &max_hits, 

5 "Will report the compounds with the N lowest shapeunit scores less than or equal to the 

-shapeunits value." }, 

{ "output", ParseOptString, &OutputFileName, 

"Will report results to this filename, default is stdout." }, 
{ "report", ParseOptEnum, report modes, 
10 "Reporting mode, default is TSV " }, 

{ "qdetail", ParseOptString, &queryDetailFileName, 
"write query fragments to this filename." }, 
{ "debugFile", ParseOptString, &debugFileName, 

"write debugging information to this file, CAUTION: creates extension amount of 
15 information per compound" }, 

}; 

/* static variables */ 
static top_result **result_root; 
20 ^ static int result_idx; 
Q static int cnt = 0; 
static int nhit = 0; 

M static time t tnow; 

r|i — 

^ /* local functions */ 

J static FILE *open_input_source(char *unitydb, char *setname, char *hitlist, int *r_ispipe ); 
^ static void saveResult(top_result *res, int max hits, double *r_radius ); 

static int top_result_compare(const void *vnrec, const void *vtrec ); 
30J static void formatTSV(FILE *fp, struct CtConnectionTable *ct, double comfa_diff, int idx); 
^ static int formatDetail(FILE *fp, top_result *res, int reportHitFrags ); 
JrJ static void formatTSVD(FILE *fp, top_result *res ); 

static void formatRegid(FILE *f]p, struct CtConnectionTable *ct, int idx); 
rf static void writeDetailHeader(FILE *fp, ReportMode rmode); 
35 rC " static void writeTSVDHeader(FILE 

static int echo_hitlistLine(char *line); 

static void setAttr(struct CtConnectionTable *ct, char *name, char *value ); 
static void writeQueryDetails(char *fname ); 

40 #if 0 

#define CACHE COUNTERS 1 
#endif 

int main(int argc, char *argv[] ) 

45 { 

FILE *outfp; 
FILE *in_fp; 
FILE *qfp; 
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FILE *dfp = (FILE *) 0; 
int isPipe; 
int i; 

struct CtConnectionTable *ct; 
5 struct CtConnectionTable *qct; 

struct CtConnectionTable *core_qct; 
char *tptr; 
char *sln; 
char *regid; 

10 int t_frags, t_2compare, t_3compare, t_fcompare, t_filtered, t_feat; 

int nargs; 

double comfa_diff; 
int filtered; 
top_result *res; 
15 double *cord; 

int natoms; 
int noCordCnt = 0; 
int mixtures = 0; 
int nParts; 
20^ int keepCts; 

□ top_result *rptr; 

double outsidePerc; 
int queryHevCount; 
jjf int strHevCount; 

2Sjjj int realHevCount; 

*; int hevFiltered = 0; 

j: int strHevDiff; 

^ double *my Feature Weights; 

L #ifdef CACHE_COUNTERS 
30y inteO, el; 

long long cO, cl; 

#endif 


J #ifdef MMXFAST 
35" mallopt(M_MXFAST,128); 
#endif 

#ifdef M_BLKSZ 

mallopt(M_BLKSZ, 16*1024); 

#endif 

40 #ifdef M_FREEHD 

mallopt(M_FREEHD, 1); 

#endif 

#ifdef M_MXCHK 

mallopt(M_MXCHK, 100000); 

45 #endif 
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nargs = UTL_PARSE_OPT( argc, argv, sizeof(Options) / sizeof(struct ParseOptions), Options 

if ( ! nargs ) 

return -1; 

#if 0 

if (!LM_STANDALONE_INIT() ) 
{ 

fprintf(stderr," License intialization failed. \n"); 
return -1; 

} 

if ( !LM_STANDALONE_VALID_LICENSE("QSAR") ) 
{ 

fprintf(stderr,"A valid QSAR license is required. \n"); 
return -1; 

} 

#endif 

rmode = ReportTSV; 

if ( Options[REPORT_OFFSET]._explicit ) 

{ 

tptr = *((char **) Options [REPORTOFFSET]. value); 
if ( !strcmp(tptr,"tsv") ) 

rmode = ReportTSV; 
else if ( !strcmp(tptr,"tsvd") ) 

rmode = ReportTSVD; 
else if ( !strcmp(tptr, M regid M ) ) 

rmode = ReportRegid; 
else if ( !strcmp(tptr, "detail" ) ) 

rmode = ReportDetail; 
else if ( !strcmp(tptr,"sln" ) ) 

rmode = ReportSln; 
else if ( !strcmp(tptr,"core" ) ) 

rmode = ReportCore; 
else if ( !strcmp(tptr, "matrix" ) ) 

rmode = ReportMatrix; 

else 
{ 

fprintf(stderr,"Not a valid reporting option: %s\n", tptr ); 
return -1; 

} 

} 

if ( Options[FEATURE_SET_OFFSET]. explicit ) 
{ 

tptr = *((char **) Options [FE ATURE_SET_OFFSET] . value) ; 
if ( !strcmp(tptr,"topomer" ) ) 

featureSet = UseTopomerFeatures; 
else if ( !strcmp(tptr, "unity" ) ) 
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featureSet = UseUnity Features; 

else 

featureSet = UsePreferredUnityFeatures; 
fprintf(stderr, "Using %s feature set %d\n", tptr, featureSet ); 

5 } 

if ( hevDiff = = -2 ) 

{ 

if ( rmode = = ReportCore ) 
hevDiff = -1; 
10 else if ( doSubset ) 

hevDiff = 30; 

else 

hevDiff = 12; 

} 

15 if ( minHev == -1 ) 

{ 

if ( rmode = = ReportCore ) 

minHev = 1; 
else if ( doSubset ) 
20 minHev = 10; 

O else 
M3 minHev = 10; 

s ) 

^ if ( maxHev = = -1 ) 

25U1 { 

+; if ( rmode = = ReportCore ) 

f maxHev = 1000; 

^ else if.( doSubset ) 

maxHev = 80; 

3(N else 
£ maxHev = 80; 

m > 

li" if ( attachmentFactor = = -1 ) 

2 { 

35' if ( rmode = = ReportCore ) 

attachmentFactor = 50.0; 

else 

attachmentFactor = 10.0; 

} 

40 if ( min_atoms = = -1 ) 

{ 

if ( rmode = = ReportCore ) 
min_atoms = 1; 

else 

45 min_atoms = 4; 

} 

if ( AllowTerminal Atoms = = -1 ) 

{ 
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if ( rmode = = ReportCore ) 

AllowTerminal Atoms = 1; 

else 

AllowTerminalAtoms = 0; 

} 

if ( normalize = = -1 ) /* User didn't specify, so auto select based upon input 

type */ 

{ 

if ( UnityDatabase ) 

normalize = 0; 

else 

normalize = 1; 

} 

if ( ! UnityDatabase && [normalize ) 

fprintf(stderr,"\nWARNING: Make sure structures in hitlist are in aromatic and 
standardized form when using -arom\n\n" ); 
#if 0 

if ( Options[REGION_OFFSET]. explicit ) 
{ 

tptr = *((char **) Opt ions [REGIONOFFSET]. value); 
if ( !strcmp(tptr, "normal" ) ) 

regionMode = 0; 
else if ( !strcmp(tptr,"big" ) ) 

regionMode = 1; 
else if ( !strcmp(tptr,"huge" ) ) 

regionMode = 2; 

else 

{ 

fprintf(stderr,"not a valid region mode:%s\n", tptr ); 
return -1; 

} 

} 

#endif 


if ( stepSize < 1.5 | j stepSize > 2.5 ) 
{ 

fprintf(stderr,"You must be kidding on this stepsize. Please keep between 1 .5 and 2.5 An" 

); 

} 

#if 0 

TOP_STER_REGION_MODE(regionMode) ; 

#endif 
#if 0 

if ( rmode ! = ReportTSV ) 
{ 

fprintf(stderr," other report options not supported, see -debugFile \n"); 
fyrintf(stderr,"What formatting options do you want? \n" ); 
goto bailout; 
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} 

#endif 

if ( IQueryFileName && rmode ! = ReportMatrix ) 
{ 

fprintf(stderr,"No query file specifiedAn"); 
return -1; 

} 

qfp = (FILE *) 0; 

if ( rmode ! = ReportMatrix ) 

{ 

qfp = fopen(QueryFileName, "r"); 

if(!qfp) 

{ 

fprintf(stderr," Failed to open query file:%s\n", QueryFileName ); 
return -1; 

} 

} 

if ( debugFileName ) 
{ 

dfp = fopen(debugFileName,"w"); 

if(dfp) 

{ 

fprintf(dfp,"#SYBYL/3DB HITLIST\n#@CLASS STRLIST\n"); 
fprintf(dfp,"#@FIELD TS_SID INT\n"); 
fprintf(dfp,"#@FIELD TS_QID INT\n"); 

} 

} 

if ( str_featureWeights ) 

myFeatureWeights = parseFeatureWeights(str_feature Weights); 

else 

myFeatureWeights = (double *) 0; 

in_fp = open_input_source(UnityDatabase, UnitySetName, hitlist, &isPipe ); 
if ( !in_fp) 

{ 

return -1; 

} 

if ( OutputFileName ) 
{ 

outfp = fopen(OutputFileName, "w"); 
if ( loutfp ) 

{ 

fprintf(stderr, "Failed to open %s for output\n", OutputFileName ); 
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goto bailout; 

} 

} 

else 

outfp = stdout; 


keepCts = 0; 

if ( rmode = = ReportDetail ) 
keepCts = 1; 

if ( rmode = = ReportDetail | | rmode = = ReportSln ) 
writeDetailHeader(outfp, rmode ); 

} 

else if (rmode = = ReportTSVD ) 

writeTSVDHeader(outfp); 
else if (rmode = = ReportTSV) 

fprintf(outfp, TOPSIMNn"); 

qct = (struct CtConnectionTable *) 0; 

while ( qfp && !qct && UTL_SCAN_GETS(qfp, "\\\ (char *) 0, &sln ) > 0 ) 

if ( * sln = ='#') 

continue; 
qct = DBIMPORTSLN(sln); 
if ( qct ) 

queryHevCount = TOPHEVCOUNT(qct); 


if ( qfp && !qct ) 
{ 

fprintf(stderr,"No query contained in :%s\n", QueryFileName ); 

bailout: 

if ( isPipe ) 

pclose(injp); 
return -1; 

} 

if ( rmode = = ReportCore ) 
{ 

core_qct = qct; 

qct = (struct CtConnectionTable *) 0; 

} 

if ( TOP_QUERY_OPTIONS(qct, do2piece, do3piece, doSubset, min atoms, stericPivot, 
partialMatch, 

AllowTerminal Atoms, useFallback, hevDiff, 0, reductionFactor, featureFactor, 
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attachmentFactor, stepSize, 

featureSet, useFeatureCharges, myFeatureWeights, extraFeaturePenalty, dip, 

debugLevel) && qct ) 
{ 

fprintf(stderr, "Failed to setup topomer searching for query. \n"); 
fprintf(stderr,"Most likely no 3D coordinates or cannot split query. \n"); 
goto bailout; 

} 

if ( rmode = = ReportCore ) 
{ 

DoCoreSearching(core_qct, injjp, outfp ); 
qct = coreqct; 
goto closeup; 

} 

if ( rmode = = ReportMatrix ) 
{ 

DoMatrixSearching(in_fp, outfp); 
goto closeup; 

} 

if ( qct && queryDetailFileName ) 
{ 

writeQueryDetails(queryDetailFileName); 

} 

#ifdef CACHE_COUNTERS 
eO = 1; 

el = 25; /* 26 L2 data cache, 25 LI data cache, see perfex */ 

start_counters(eO, el); 

#endif 

while ( UTL_SCAN_GETS(in_fp, "\V\ (char *) 0, &sln ) > 0 ) 

if ( * s ln = = ) 
{ 

if ( rmode = = ReportDetail && echohitlistLine(sln) ) 

DB_CT_SLN_WRITE(outfp, sin ); 
continue; 

} 

cnt++; 

ct = (struct CtConnectionTable *) 0; 

if ( hevDiff > = 0 ) 

{ 

strHevCount = slnHevCount(sln); 
strHevDiff = queryHevCount - strHevCount; 
if ( strHevDiff < 0 ) 

StrHevDiff * = -1; 

if ( strHevDiff > hevDiff | j strHevCount < minHev j j strHevCount > 
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maxHev ) 

hevFiltered+ + ; 

else 

ct = DB_IMPORT_SLN(sln); 

5 } 

else 

ct = DBJMPORTSLN(sln); 

if ( !(cnt % 1000) ) 
10 { 

#ifdef CACHECOUNTERS 

read_counters(eO, &c0, el, &cl ); 
start_counters(eO, el); 

fprintf(stderr, "cache miss rate: %8.31f\n", (double) ( ( (long double) cl / (long 
15 double) cO ) ) * 10000.0 ); 
#endif 

#ifdef TRIPOS_VERSION 

TOP_GET_STATS(!(cnt % 10000), &t_frags, &t_2compare, &t_3compare, 
&t_fcompare, &t_filtered, &t_feat, &outsidePerc); 
20 #else 

□ TOP_GET_STATS(0, &t_frags, &t_2compare, &t_3compare, &t_fcompare, 
vD &t_filtered, &t_feat, &outsidePerc); 

OS #endif 

2m #ifO 

=C if ( outsidePerc > 10.0 ) 

f { 

Cy fprintf(stderr, "Warning %8.41f percent of the fields evaluated have atoms 

outside the field, try using a larger fieldAn", 

outsidePerc ); 

} 

□ #endif 

^ time(&tnow); 

H fprintf(stderr,"hit %3d of %4d filtered %4d (%d+ %d+ %d+ %d, 

No3D+ Mix + Hev + Feat) out: %6.31f Avg Frags: % 7. 21f & Comparisons: %7.21f %$\ 

nhit, cnt, noCordCnt + mixtures + hevFiltered + t_feat, noCordCnt, 
mixtures, hevFiltered, t feat, outsidePerc, 

(double) t_frags / (double) cnt, (double) t_fcompare / (double) cnt, 

ctime(&tnow) ); 

40 #if0 

fprintf(stderr,"completed: %d no3D: %d mixtures: %d frags: %d comparisons: 
%d %d %d %8.41f %8.41f %8.41f %8.41f\n", 

cnt, noCordCnt, mixtures, 
t frags, t_2compare, t_3compare, t fcompare, 
45 (double) t_frags / (double) cnt, 

(double) t_2compare / (double) cnt, 
(double) t_3compare / (double) cnt, 
(double) t_f compare / (double) cnt ); 
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#endif 

} 

if ( !ct ) 

continue; 

5 cord = (double *) 0; 

DB_CT_GET_CT_ATTR(ct, CtCt3DCoordSet, &cord, &natoms ); 
if ( !cord ) 

{ 

DBCTDELETECT(ct); 
10 if ( dfp ) 

fprintf(dfp, "# compound %d missing cordinates\n", cnt ); 
noCordCnt+ + ; 
continue; 

} 

15 DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, &nParts ); 

if (nParts != 1 ) 

{ 

DBCTDELETECT(ct); 
mixtures + + ; 
20 continue; 

} 

CO if ( normalize ) 

ru { 

23*1 DB_CT_NORM_AROM(ct) ; 

DB_CT_STANDARD(ct, (int *) 0); 
£ UTLERRORCLE AR() ; 

3(0 if ( max hits > 0 ) 

f < ~ 

^ res = TOP_COMPARE_WDETAIL(ct, radius, cnt,keepCts); 

^ if ( res ) 

y { 

35^ nhit+ + ; 

saveResult(res, max_hits, &radius ); 

} 

else 

DBCTDELETECT(ct); 

40 } 

else if ( rmode = = ReportDetail J j rmode - = ReportTSVD | j rmode = = ReportSln 

{ 

res = TOP_COMPARE_WDETAIL(ct, radius, cnt, keepCts ); 
45 if ( res ) 

{ 

nhit+ + ; 

if ( rmode = = ReportDetail ) 
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} 

10 else 

{ 


formatDetail(outfp, res, 1 ); 
else if ( rmode = = ReportSln ) 

formatDetail(outfp, res, 0 ); 

else 

formatTSVD(outfp,res); 
TOP_FREE_RESULT(res, 1); 

} 

DB CT DELETE CT(ct); 


comfadiff = TOP_COMPARE( ct, radius, &filtered, cnt ); 
if ( comfa_diff > = 0.0 && ( comfa_diff < = radius | j radius < 0.0 ) ) 
{ 

15 nhit+ + ; 

if ( rmode = = ReportTSV) 

formatTSV(outfp, ct, comfa_diff, cnt ); 
else /* if ( rmode = = ReportRegid ) */ 

formatRegid(outfp, ct, cnt ); 

20 } 
q DBCTDELETECT(ct); 

In } 

5 } 

flj #ifdef TRIPOS_VERSION 

TOP_GET_STATS(l , &t_frags, &t_2compare, &t_3compare, &t_fcompare, &t_filtered, &t_feat, 
Jr &outsidePerc); 
jr #else 

CO TOP_GET_STATS(0, &t Jrags, &t_2compare, &t_3compare, &t_fcompare, &t_filtered, &t_feat, 

~ &outsidePerc); 
3© #endif 

time(&tnow); 

O fprintf(stderr, "hit %3d of %4d filtered %4d (%d+ %d+ %d+ %d, 

fy No3D + Mix + Hev + Feat) out:%6.31f Avg Frags: % 7. 21f & Comparisons: %7.21f %s", 
U nhit, cnt, noCordCnt + mixtures + hevFiltered + t_feat, noCordCnt, mixtures, 

3$^ hevFiltered, t_feat, outsidePerc, 

(double) t_frags / (double) cnt, (double) t_fcompare / (double) cnt, 
ctime(&tnow) ); 
if ( max_hits > 0 ) 

{ 

40 if ( result_idx > 1 && result_idx ! = max_hits ) 

qsort( (void *) result root, (size J) result_idx, (size J) sizeof(top_result *) , 
top result compare ); 
for ( i = 0; i < maxjiits && i < resultjdx; i+H- ) 
{ 

45 res = result_root[i]; 

if ( Ires ) 

continue; 
if ( rmode = = ReportTSV ) 


45 
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formatTSV(outfp, res->ct, res- > comfa_diff , res->idx); 
else if ( rmode = = ReportTSVD ) 

formatTSVD(outfp, res ); 
else if ( rmode = = ReportRegid ) 

formatRegid(outfp, res->ct, res->idx ); 
else if ( rmode = = ReportDetail ) 

formatDetail(outfp, res, 1 ); 
else if ( rmode = = ReportSln ) 

formatDetail(outfp, res, 0 ); 


} 


for ( i = 0; i < resalloc; i++ ) 
{ 

15 rptr = result_root[i]; 

if ( !rptr ) 

continue; 
if ( rptr->ct ) 

DB_CT_DELETE_CT(rptr- > ct); 
20 TOP_FREE_RESULT(rptr, 1); 

O resultrootfi] = (topresult *) 0; 

ya } 

Ey closeup: 

ft if(qct) 

2$n DBCTDELETECT(qct); 

=t if ( isPipe ) 

=P pclose(in_fp); 

tB else if ( in_fp ! = stdin ) 

^ fclose(in_fp); 

3<D if ( dfp ) 

:= F fclose(dfp); 

y if (outfp ! = stdout ) 

Jj^ fclose(out^p); 

|*f if ( rmode ! = ReportMatrix ) 

35 :te dumpfrags tats() ; 
return 0; 

} 


40 static FILE *open_input_source(char *unitydb, char *setname, char *hitlist, int *r ispipe ) 
{ 

char ^command; 
int len; 
FILE *fp; 

45 

if ( unitydb ) 

{ 

len = strlen(unitydb) + 128; 
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if ( setname ) 

len + = strlen(setname); 
command = malloc(len); 

5 if ( setname ) 

sprintf(command,"dbexport -database %s -use_set %s -query regid + coords 
-visual '*'", unitydb, setname ); 
else 

spr intf (command, "dbexport -database %s -query regid +coords -visual '*'", 

10 unitydb ); 

fp = popen(command, M r"); 
if (Hp) 

fprintf(stderr, "Failed to start the command :\n%s\n", command ); 

15 else 

*r_ispipe = 1; 
free(command); 
return fp; 

} 

20 if ( hitlist && strcmp(hitlist, "-") ) 

□ { 

£ fp = fopen(hitlist,"r"); 

| if ( !<P ) 

fU fpr intf (stderr, "Failed to open the hitlist: %s\n", hitlist ); 

25J1 *r_ispipe = 0; 

=P return fp; 

*r_ispipe = 0; 
5 return stdin; 

3(M } 


35^ static int top_result_compare(const void *vnrec, const void *vtrec ) 

{ 

top_result **n = (top_result **) vnrec; 
top_result **t = (top_result **) vtrec; 
double cdiff; 

40 

cdiff = (*n)->comfa_diff - (*t)- > comfa_diff ; 
if (cdiff > 0.0) 
return 1; 
else if ( cdiff < 0.0 ) 
45 return -1; 

return (*t)- > idx - (*n)- > idx; 
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static void saveResult(top_result *res, int max hits, double *r_radius ) 
{ 

static int res_max; 
topjresult *rptr; 
int i; 

static char *suffix[] = { "th", "st M , "nd", "rd" }; 
int sidx; 

if ( !result_root ) 

{ 

resjnax = max_hits; 

res_alloc = maxjiits + 5 + max_hits / 10; /* a little extra */ 
result_root = (top_result **) calloc(res_alloc, sizeof(top_result *) ); 

} 

if ( res ) 

{ 

result_root[result_idx] = res; 
resultjdx+ + ; 

if ( result_idx = = res_alloc ) 
{ 

qsort( (void *) result_root, (sizej) res alloc, (sizej) sizeof(top_result *) , 

top_result_compare ); 
for ( i = res_max; i < res_alloc; i++ ) 

{ 

rptr = result_root[i]; 
if ( Irptr ) 

continue; 
if ( rptr->ct ) 

DB_CT_DELETE_CT(rptr- > ct); 
TOP_FREE_RESULT(rptr, 1); 
result_root[i] = (top result *) 0; 

} 

result_idx = resjnax; /* start finding a few more to add in */ 
rptr = result_root[res_max-l]; 

if ( *r_radius && *r_radius > 0.0 && rptr- > comfa_diff < *r_radius ) 
{ 

sidx = 0; 
if ( resjnax < 4 ) 

sidx = resjnax; 

fyrintf(stderr," %d%s lowest shape distance: %8.21f old: %8.21f after: %d 

res max, suffix[ sidx ], 
rptr- > comfa_diff , *r_radius, cnt ); 
*r_radius = rptr- > comfa_diff ; 

} 


\n", 


} 
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} 

} 

static void setAttr(struct CtConnectionTable *ct, char *name, char *value ) 
5 { 

char *tval; 

tval = (char *) 0; 

10 DB_CT_GET_CT_ ATTR(ct , CtCtUser Value, &tval, name ); 

if ( tval ) 

DB_CT_UTL_MOD_SIMPLE_CT_ATTR(ct, CtCtUserValue, value, name ); 

else 

DB_CT_SET_CT_ATTR(ct, CtCtUserValue, value, name ); 
15 UTL_ERROR_CLEAR(); 

} 


static int formatDetail(FILE *fp, top result *res, int reporthitFrags ) 
20 { 
□ char name [40]; 

ypj char value[40]; 

B int i; 

RJ int noSub; 

2fyj struct CtConnectionTable *ct; 

pa 
4= 

f 

to if ( !fp 1 1 Ires J j !res->ct ) 
= return -1; 

3(D 

jh ct = res->ct; 

J sprintf(value, " %d " , (int) res- > comfa diff ); 

^ setAttr(ct,"TOPSIM", value); 

sprintf(value,"%d", (int) res->best2 ); 
setAttr(ct,"TS_2P", value); 

sprintf(value,"%d", (int) res->best3 ); 
40 setAttr(ct, "TS_3P", value ); 

if ( doSubset ) 
{ 

sprintf(value,"%d", (int) res->bestSub ); 
45 setAttr(ct, "TSJUBSET", value ); 

} 

if ( res- > best3 < res->best2) 
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noSub = 3; 

else 

noSub = 2; 

if ( IreporthitFrags ) 
{ 

for ( i = 0; i < 3; i++ ) 
{ 

sprintf(value,"%d", res->qids[i] + 1 ); 
sprintfCname/TSJ^n^d", i + 1 ); 
setAttr(ct, name, value ); 

sprintf(value, M %d H , res->strids[i] +1 ); 
sprintf(name, n TS_SID%d", i+1 ); 
s et Att r(ct , name , value) ; 

} 

for ( i = 0; i < noSub; i++ ) 
{ 

sprintf(value,"%8.41f", res->hexDiffs[i] ); 
sprintf(name,"TS_S%d", i+1 ); 
set Attr(ct , name , value) ; 

} 

for ( i = 0; i < noSub; i+ + ) 
{ 

sprintf(value,"%8.41f", res->featureDiffs[i] ); 
sprintf(name,"TS_F%d", i+1 ); 
setAttr(ct,name, value) ; 

} 

} 

if ( res- > attachmentPenalty != 0.0 ) 

{ 

sprintf(value,"%8.31f", res- > attachmentPenalty ); 
set Attr(ct , "TS_ATTACH_PEN " , value ); 

} 

DB_CT_WRITE(fp, ct ); 

if ( reporthitFrags ) 
{ 

for ( i = 0; i < noSub; i++ ) 
{ 

ct = res->strFrags[i]; 
if ( !ct ) 

continue; 

sprintf(value,"%8.41f", res->hexDiffs[i] ); 
setAttr(ct, "TS_STERIC", value ); 

sprintf(value,"%8.41f", res->featureDiffs[i] ); 
setAttr(ct, "TSFEATURE", value ); 


15 


45 


sprintf(value,"%d", res- > qids[i] + 1); 
setAttr(ct, "TS QID", value ); 

sprintf(value,"%d", res->strids[i] + 1 ); 
setAttr(ct, "TS_SID", value ); 

sprintf(value,"%d", res- > outside[i] ); 
setAttr(ct, "TS_OUTR", value ); 


10 DB_CT_WRITE(fp, ct ); 

} 


} 


} 

return 0; 


static void formatTSV(FILE *fp, struct CtConnectionTable *ct, double comfadiff, int idx) 
{ 

char *regid; 

20 regid = (char *) 0; 

O if ( ct ) 

ffi DB_CT_GET_CT_ATTR(ct , CtCtRegld, &regid ); 

fy if ( !regid ) 

25/i DB_CT_GET_CT_ATTR(ct , CtQName, &regid ); 

=C } 

=F if ( regid ) 

00 fprintf(fp, " % s\t %d\n" , regid, (int) comfa_diff ); 
s else 

3(D fprintf(fp, "Str%d\t%d\n", idx, (int) comfa diff ); 

1 } 

Ty static void formatRegid(FILE *fp, struct CtConnectionTable *ct, int idx) 

P. ( 

35^ char *regid; 

regid = (char *) 0; 
if(ct) 

DB_CT_GET_CT_ATTR(ct , CtCtRegld, &regid ); /* Don't get name, only regid */ 
40 if ( regid ) 

fprintf(fp, "%s\n", regid); 

else 

fprintf(fp, "Str%d\n", idx); 


} 


static void formatTSVD(FILE *fp, top_result *res ) 
{ 

char *regid; 


51 


char tmpname[20]; 

regid = (char *) 0; 
if ( res- > ct ) 
5 { 

DB_CT_GET_CT_ATTR(res- > ct, QCtRegld, &regid ); 
if ( ! regid ) 

DB_CT_GET_CT_ATTR(res- > ct, CtCtName, &regid ); 

10 if ( Iregid ) 

{ 

sprintf (tmpname , " Str % d " , res- > idx ) ; 
regid = tmpname; 

} 

15 if ( doSubset ) 

f P r i n t f ( f p 
" % sU % d\t % d\t % d\t % d\t % 8 . 41f\t % 8 . 41f\t % 8 . 41f\t % 8 .41f\t % 8 . 41f\t % 8 .41f\t % 8 . 41f\n " , 
regid, 

(int) res->comfa_diff, (int) res->best2, (int) res->best3, (int) res->bestSub, 
20 res- > hexDiffs[0], res- > hexDiffs[l], res- > hexDiffs[2], 

P res- > attachmentPenalty , 

=£} res- > featureDiffs[0], res- > featureDiffs[l], res- > featureDiffs[2] ); 

yl else 

Pi! fprintf(fp, 
25f} ,, %s\t%d\t%d\t%d\t%8.41f\t%8.41f\t%8.41f\t%8.41f\t%8.41f\t%8.41f\t%8.41f\n", 
=P regid, 

=p (int) res- > comfa_diff , (int) res- > best2, (int) res- > best3 , 

00 res- > hexDiffsfO], res- > hexDiffs[l], res- > hexDiffs[2], 

=_ res- > attachmentPenalty, 

303 res- > featureDiffs[0], res- > featureDiffs[l], res- > featureDiffs[2] ); 

=P } 


35^ 


static void writeDetailHeader(FILE *fp, ReportMode rmode) 
{ 

time(&tnow); 


fprintf(fp, "#SYBYL/3DB HITLIST\n#\n"); 
fprintf(fp,"# Created: %s", ctime(&tnow) ); 
40 fprintf(fp,"#\n#@CLASS STRLIST\n#\n"); 

fprintf(fp, "#@FIELD TOPSIM\tINT\n"); 
fprintf(fp, "#@FIELD TS_2P\tINT\n"); 
fprintf(fp, "#@FIELD TS_3P\tINT\n"); 
45 if ( doSubset ) 

fprintf(fp, "#@FIELD TS_SUBSET\tINT\n"); 
if ( rmode = = ReportDetail ) 
{ 
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fprintf(fp, "#@FIELD TS_STERIC\tDOUBLE\n") ; 
fprintf(fp, "#@FIELD TS_FEATURE\tDOUBLE\n") ; 
fprintf(fp, "#@FIELD TS_QID\tINT\n") ; 
fprintf(fp,"#@FIELD TS_SID\tINT\n"); 
fprintf(fp, "#@FIELD TS_OUTR\tINT\n"); 

} 

else 

{ 

fprintf(fp, "#@ FIELD TS_S l\tDOUBLE\n"); 
fprintf(ip , "#@FIELD TS_S2\tD0UBLE\n" ) ; 
fprintf(fp,"#@FIELD TS_S3\tD0UBLE\n"); 
fprintf(fp, "#@FIELD TS_F 1 \tDOUBLE\n" ) ; 
fprintf(fp, "#@FIELD TS_F2\tD0UBLE\n"); 
fprintf(fp, "#@FIELD TS_F3\tD0UBLE\n " ) ; 
fprintf(fp,"#@FIELDTS_QIDl\tINT\n"); 
fprintf(fp,"#@FIELD TS_SIDl\tINT\n"); 
fprintf(fp, "#@FIELD TS_QID2\tINT\n"); 
fprintf(lp , "#@FIELD TS_SID2\tINT\n"); 
fprintfClp/'^FIELD TS_QID3\tINT\n"); 
fprintf(fp, "#@FIELD TS_SID3\tINT\n"); 

} 

fprintf(fp, "#@FIELD TS_ATTACH_PEN\tDOUBLE\n " ) ; 

} 

static void writeTSVDHeader(FILE *fp) 
{ 

if ( doSubset ) 

fprintf(fp,"TOPSIM\tTS_2P\tTS_3P\tTS_SUBSET\tTS_Sl\tTS_S2\tTS_S3\tTS_ATTACH_PEN\tFS_F 
l\tFS_F2\tFS_F3\n" ); 
else 

fprintf(fp,"TOPSIM\tTS_2P\tTS_3P\tTS_Sl\tTS_S2\tTS_S3\tTS_ATTACH_PEN\tFS_Fl\tFS_F2\tFS 
_F3\n" ); 

} 

static int echo_hitlistLine(char *line) 

{ 

char *tptr; 

static char *keep_fields[] = { "FIELD", "DATABASE", "QUERY", "CORE", (char *) 0 }; 
int i; 

if ( *line != j j *(line+l) != ) 
return 0; 

tptr = line +2; 
if ( !*tptr ) 

return 0; 
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for ( i = 0; keep_fields[i]; i++ ) 
{ 

if ( !strncmp(tptr,keep_fields[i], strlen(keep_fields[i] ) ) ) 
return 1; 

} 

return 0; 

} 


10 static void writeQueryDetails(char *fname ) 
{ 

time t tnow; 
FILE *fp; 

15 fp = fopenCfname/w"); 

if (»P) 
{ 

fprintf(stderr, "Unable to write to query detail filename: %s\n", fname ); 
return; 

20 } 
M3 time(&tnow); 

5 fprintf(fp,"#SYBYL/3DB HITLIST\n#\n"); 

2511 fprintf(fp,"# Created: %s", ctime(&tnow) ); 

=P fprintf(fp,"#\n#@CLASS STRLIST\n#\n"); 


lJJ fprintf(ip,"#@FIELD TS_QID\tINT\n"); 

3(Jp TOP_QUERY_DUMP(fp, "TS_QID"); 
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fclose(fp); 


} 


static int slnHevCount(char *sln) 
{ 

char *tptr; 

int inbrace = 0; 

int hevCount = 0; 


tptr = sin; 


while (*tptr) 

{ 

45 if ( *tptr =='[') 

{ 

while (*tptr && *tptr != ']' ) 
{ 
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if ( *tptr == "" ) 
{ 

tptr+ + ; 

while (*tptr && *tptr != "" ) 
5 tptr+ + ; 

if (*tptr) 

tptr+ + ; 

} 

else 

10 tptr+ + ; 

} 

} 

if ( isupper(*tptr) && *tptr != 'H' ) 
hevCount+ + ; 
15 if ( *tptr = = '<' ) 

return hevCount; 
tptr+ + ; 

} 

return hevCount; 

20 } 

O 

Jl static double *parseFeatureWeights(char *sptr ) 
03 { 

I y static double weights [6] ; 

7M char *tokens[7]; 

=P int ntoks; 

int i; 


3<P 

:E if(ntoks!=5) 
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ntoks = token_string(sptr, 6, 0, tokens ); 

if 
{ 


=J fprintf(stderr, "Invalid argument to -weights, please specify 5 weights for 

H arom,neg,pos,HBA,HBD \n"); 
3^ exit(-l); 
} 

for ( i = 0; i < 5; i++ ) 
{ 

weights [i] = atof(tokens[i]); 
40 if ( weights[i] < 0.0 ) 

weights[i] = 0.0; 

} 

return weights; 

} 


/* returns the number of tokens found . 
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The string str will be modified, tokens will be modified to the null character 

*/ 

int token_string(char *str, char token, int maxtoks, int skipMult, char **tokens ) 
{ 

char *tptr; 
int ntoks; 
int len, idx; 
int intok = 0; 

for ( len = 0, tptr = str; *tptr; tptr+ + , len+ + ) 
{ 

if ( *tptr = = token ) 
•tptr = '\0'; 

} 

ntoks = idx = 0; 
tptr = str; 

if ( IskipMult ) 
{ 

tokens [0] = str; 
ntoks = 1; 

} 

while (ntoks < maxtoks && idx < len ) 

{ 

if ( skipMult ) 
{ 

if ( *tptr ) 
{ 

if ( ! intok) 
{ 

tokens [ntoks + +] = tptr; 
intok = 1; 

} 

} 

else 

intok = 0; 

} 

else 

{ 

if ( *tptr = = '\0' ) 

tokens [ntoks + +] = tptr+1; 

} 

idx+ + ; 
tptr+ + ; 

} 

return ntoks; 

} 
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static int DoCoreSearching( struct CtConnectionTable *qct, FILE *infp, FILE *outfp ) 
{ 

int cnt = 0; 
int nhit = 0; 
double *cord; 
char *sln; 

struct CtConnectionTable *ct; 

top_result *res; 

int natoms; 

int nParts; 

int hasCore; 

int err; 

static FILE *corefp; 

static int reportCores = -1; 

char *regid; 

if ( reportCores == -1 ) 

{ 

reportCores = 0; 

if ( (sin = getenv("DBTOP_CORES") ) ) 
{ 

corefp = fopen(sln,"w M ); 
if ( Icorefp ) 

fprintf(stderr, "Failed to open %s to report the core regids\n", sin ); 

else 

{ 

reportCores = 1; 

fprintf(stderr, "Writing the regid for each structure with a core to %s\n", 

sin ); 

} 

} 

} 

time(&tnow); 

fprintf(outfp, "#SYBYL/3DB HITLIST\n#\n"); 
fprintf(outfp,"# Created: %s", ctime(&tnow) ); 
fprintf(outfp, "#\n#@CLASS STRLIST\n#\n"); 

fprintf(outfp, "#@FIELD CORESIM\tINT\n"); 
fprintf(outfp, "#@FIELD TS_UNIQ_ID\tINT\n"); 
fprintf(outfp, "#@FIELD TS_HIT_ID\tINT\n"); 
fprintf(outfp, "#@FIELD TS_ATTACH_PEN\tINT\n"); 
fprintf(outfp, "#@FIELD TS_FEATURE\tINT\n " ) ; 
fprintf(outfp, "#@FIELD TS_STERIC\tINT\n"); 
fprintf(outfp, "#@FIELD TS_QID\tINT\n"); 


57 


err = TOP_CORE_QUERY(qct, outfp); 
if ( err ) 

return err; 

while ( UTL SC AN_GETS(inrp , "\\", (char *) 0, &sln ) > 0 ) 
{ 

if ( * s ln =='#') 
continue; 

cnt+ + ; 

UTL_ERROR_CLEAR(); 
ct = DBIMPORTSLN(sln); 

if ( !(cnt % 1000) ) 
{ 

time(&tnow); 

fprintf(stderr,"core searching hit %3d of %4d %s", nhit, cnt, ctime(&tnow) ); 

} 

if ( !ct ) 

continue; 
cord = (double *) 0; 

DB_CT_GET_CT_ATTR(ct , CtCt3DCoordSet, &cord, &natoms ); 
if ( !cord ) 

{ 

DBCTDELETECT(ct); 
continue; 

} 

DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, &nParts ); 
if ( nParts != 1 ) 

{ 

DB_CT_DELETE_CT(ct); 
continue; 

} 

if ( normalize ) 

{ 

DB_CT_NORM_AROM(ct) ; 
DB_CT_ST AND ARD(ct , (int *) 0); 

} 

DB_CT_UTL_FIND_RINGS(ct); 
UTL_ERROR_CLEAR(); 
regid - (char *) 0; 

DB_CT_GET_CT_ATTR(ct , CtQRegld, &regid ); 
if ( ! regid ) 

DB_CT_GET_CT_ATTR(ct , CtCtName, &regid ); 

res = TOP_CORE_SEARCH(ct, radius, max attachpen, &hasCore ); 
if ( corefp && hasCore ) 

{ 
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regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct, QCtRegld, &regid ); 
if ( Iregid ) 

DB_CT_GET_CT_ATTR(ct, CtCtName, &regid ); 

5 } 

if ( res ) 

{ 

DB_CT_WRITE(outfp, res->strFrags[0] ); 
DB_CT_WRITE(outfp, res->strFrags[l] ); 
10 fflush(outfp); 

nhit+ + ; 

} 

bBCTDELETECT(ct); 

} 

15 time(&tnow); 

fprintf(stderr,"core searching hit %3d of %4d %s", nhit, cnt, ctime(&tnow) ); 
return 0; 

} 

20 static int DoMatrixSearching(FILE *infp, FILE *outfp ) 

y3 char **slns; 

03 int alloc_slns; 

fy int nused; 

25jf char *sln; 

=p int *matrix; 

int ij; 

yj int matrixSize; 

3(P nused = 0; 

+■ alloc_slns = 501; 

y sins = (char **) calloc(ailoc_slns, sizeof(char *) ); 

O while ( UTL_SC AN GETS (infp , "W", (char *) 0, &sln ) > 0 ) 

3& { 

if ( *sln =='#') 

continue; 
if ( nused > = alloc_slns ) 

{ 

40 alloc_slns *= 2; 

sins = (char **) realloc((char *) sins, alloc_slns * sizeof(char *) ); 

} 

sins [nused] = strdup(sln); 
nused + + ; 

45 } 

matrix = TOP_MATRIX_SEARCH(slns, nused); 
if ( ! matrix ) 

return -1; 
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matrixSize = nused * nused; 

for (i = 0; i < matrixSize; i++ ) 

{ 

fprintf(outfp, " %d\n M , matrix[i]); 

} 


APPENDIX "B" - CT TOP.H 


#define TRIPOS_VERSION 1 

typedef enum 
{ 

UseUnityFeatures , 
UsePreferredUnityFeatures , 
UseTopomerFeatures 
} FeatureSetName; 

typedef struct top result def 
{ 

struct CtConnectionTable *ct; /* is NOT FREED by TOP_FREE_RESULT, managed by caller 

*/ 

int idx; 

void *userdata; /* pointer to something else if needed */ 

int filtered; 

double comfa diff; 

double best2; /* best 2 piece hit */ 

double best3; /* best 3 piece hit */ 

double bestSub; /* best subset hit, when enabled */ 

int hit3Piece; /* if true a 3 piece fragment was hit */ 

struct CtConnectionTable *qFrags[3]; /* call TOP FREE RESULT to free memory, just pointers 

*/ 

struct CtConnectionTable *strFrags[3]; /* copies */ 
double hexDiffs[3]; 
double featureDiffs[3]; 

double attachmentPenalty; /* for 3 piece only */ 
int qids[3]; 
int strids[3]; 
int outside[3]; 
} top result; 

/* Topomer heterogenic searching functions. 

1st call TOP_QUERY_OPTIONS with the query ct 

2nd call TOP COMPARE WDETAIL or TOP_COMPARE to do a topomer comparison 

*/ 

/* only hits return a non nill pointer, use radius = -1.0 to return all results */ 
int TOP_QUERY_OPTIONS(struct CtConnectionTable *ct, int do2piece, int do3piece, int doSubset, int 
minatoms, int autoScale, int partialMatch, int terminalFlag, int fallbackFlag, int hevDiff, int filterFlag, 
double reductionFactor, double featureFactor, double attachmentFactor, double stepSize, FeatureSetName 
featureSet, int useFeatureCharges, double *feat_weights, double extraPenalty, FILE *queryfp, int 
debugLevel ); 

top_result *TOP_COMPARE_WDETAIL(struct CtConnectionTable *ct, double radius, int idx, int 
keepCts ); 

double TOP_COMPARE(struct CtConnectionTable *ct, double radius, int *filtered, int idx ); 
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/ TOPJZOMPARE is faster, but no detail is returned, only the comfa_diff, 

negative upon failure 

results are returned even if below radius */ 

5 void TOP_FREE_RESULT(top_result *res, int freeRef); 

void TOP_QUERY_DUMP(FILE *fp, char *id_fieldname ); 

int TOP_GET_STATS(int dumpRegions, int *r_tfrags, int *r_2compare, int *r_3compare, int 
*r_fcompare, int *r_filtered, int *r_feat, double *r_outsidePerc ); 
int TOP_HEV_COUNT(struct CtConnectionTable *ct); 
10 top_result *TOP_CORE_SEARCH(struct CtConnectionTable *ct, double radius, double max attachpen, 
int *r_hascore ); 

int TOP_CORE_QUERY( struct CtConnectionTable *ct, FILE *fp); 
int *TOP_MATRIX_SEARCH(char **slns, int numSlns ); 
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APPENDIX "C" - CT TOP.C 


#include <stdio.h> 
#include <stdlib.h> 
^include <stdarg.h> 
#include <ctype.h> 
#include <string.h> 
#include <malloc.h> 

include "ct.h" 
#include "import_proto.h" 
#include "ct int.h" 
#include "ct_proto.h" 
#include "srch2_proto.h" 

#include "utl_mem.h" 
#include "utl_str.h" 
#include "utl_error.h" 
#include "set.h" 

#include "utlgeom.h" 
#include "utl_set.h" 
#inelude "comfa.h" 
#include "ctjop.h" 


#ifhdef TRUE 
#define TRUE 1 
#endif 

#define SPLITDEBUG 1 
/* 

#defme DEBUGVALIDB 

#define HEV_STATS 1 

#define CALC_BATCH_DIFF 1 

#define USE HEX 1 

#define STDREGION 1 

#define NO_COMPRESSION 1 

#define NUMBER JDF_COMPRESSION FIELDS 5 

#define NO_STRMAP 1 

#define DEBUG_DETAIL 1 

*/ 

#define MAX FEATURES 200 


#ifdef NUMBER_OF_COMPRESSION_FIELDS 

#define COMPRESSION_POINTS NUMBER_OF_COMPRESSION_FIELDS 


63 


#else 

#define COMPRESSION_POINTS 0 
#endif 


#define NO_REGIONS 11 
static int maxregions; 
static double qxmin = 999.0; 
10 static double qymin = 999.0; 
static double qzmin = 999.0; 
static double qxmax = -999.0; 
static double qymax = -999.0; 
static double qzmax = -999.0; 
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static double aggreg descale = 0.85; 


struct bond_detail_rec { 

setjptr to_atts; /* if this is a topomerically labile bond, 
2<h points to set of atoms in fragment rooted at "to" */ 

int best[3]; /* " " , ordered best three attachments to the "to" atom */ 
Qj int identical[2]; /* " " , TRUE if n'th and n-l'th sttachments are identical */ 

fy int natlvs2[2]; /* " " , difference, in # atoms, between n'th and n-Fth attachment */ 

in int lastnat[2]; /* " " , # ats in n-Fth attachment */ 

2£ }; 

gg struct bond_top_rec { 

s int from, to; /* end atom IDs */ 

Q struct bond_detail_rec ^detail; /* FALSE if bond is not topomerically labile */ 

3<P }; 

fy struct top_graph { 

0 int maxatoms, maxbonds; /* allocated maximum values */ 

H int natoms, nbonds; 

35 int *bstart; /* pointers to first bond top rec for each atom */ 

struct bond top rec *bstuff; 

}; 

typedef struct aromset def { 
40 int numAtoms; 

int *atoms; 
} AromSet; 

typedef struct frag def { 
45 int baseAtom; 

int copyBaseAtom; /* baseAtom is from the Original ct, copyBaseAtom references this ct, the 
fragment */ 

int atomCnt; 
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int hevCnt; 
int aromCnt; 
int id; 
int outside; 

int npoints; /* number of points in this region, sizeof topField */ 
int regionldx; /* which region to use, deterines size of *topField */ 
int *atoms; 

struct CtConnectionTable *ct; 

double *cords; /* a pointer into the ct's cordinates, don't free */ 
double *topField; 
#ifdef STDREGION 

double *stdField; 
double *stdDiff; 

#endif 

#ifdef USE_HEX 

char *topHex; 

char *topInt; /* parsed string of ints , well chars valued 0-15*/ 
int topIntSize; 

#endif 

double *AtWts; 

double *hexDiff; /* sizeof number of fragments for comparing against current compound X */ 
double *featureDiff; 
double *feature2PDiff; 
double *feature3PDiff; 
double *featureSubsetDiff; 

int *origMapping; /* Maps this ct's atoms into the ct into Split */ 

double *cent; /* aromCnt * 4, x, y, z, and attrition factor is the 4th double */ 

double outsidePenalty; 

double *qtf[NO_REGIONS] ; /* query topomer fields */ 

} Frag; 


typedef struct split2_def { 

int bondld; 

int fragl; 

int frag2; 

int *bl; 

int *b2; 
#ifhdef NO_STRMAP 

int *strMap; 

int *subsetMap; 
allocSubsetMap */ 
#endif 
} split2; 

typedef struct split3_def { 
int bondl; 
int bond2; 
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/* size of number of 2 piece fragments in structure see alloc2Map */ 

/* size is the number of 3 piece fragmetns in the structure see 


int fragl; 
int frag2; 
int frag3; 
int frag4; 

5 int *bl; /* atoms, change to al,a2,a3 */ 

int *b2; 
int *b3; 
int *b4; 
#ifhdef NO_STRMAP 

10 int *strMap; /* size of number of 3 piece fragments in structure see alloc3Map */ 

#endif 
} split3; 


15 typedef struct split_def { 
split2 *s2; 
split3 *s3; 
Frag *frags; 

struct CtConnectionTable *ct; 
2CU int s2cnt; 

J int s3cnt; 

S int numFrags; 

5l int atomCount; /* number of atoms in the ct */ 

IS int *atomMask; /* Which atoms are Hev atoms, and optionally not terminal atoms */ 

2ifc int bondCount; /* Number of bonds in the ct */ 

g int *bondMask; /* Bonds where splits occur */ 

m int *singleBonds; /* single bonds not in rings, and not to primary atoms, H,Cl,Br */ 

~~ int numHev; /* number of heavy atoms in the ct */ 

int *featureMask; /* array the size of atomCount. Mask representing if this atom is which 
3CF features. */ 

g int featureCnts[5]; /* total-number of features, by type */ 

ry int *aromMask; /* for features, the atoms which hit one of the aromatic patterns */ 

p int numArom; 

U AromSet *aromSets; /* an array the size of numArom */ 

35 int fragsBuilt; 

int connectedHBTotalCnt; 

int *connectedHBCnt; /* size of atomCount. # of connected atoms which are HBA & HBD and 
atom is HBA or HBD */ 

int *connectedHBAtoms; /* size of atomCount * 5 */ 
40 #ifhdef NO_STRMAP 
int alloc2Map; 
int alloc3Map; 
int allocSubsetMap; 

#endif 
45 } Split; 

typedef struct branch _info_def { 
int toAtom; 
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int chainSize; 
double molWeight; 
} branchlnfo; 

5 typedef enum 

{ 

FeatureNone = 0x0, 
FeatureArom = 0x1, 
FeaturePos = 0x2, 
10 FeatureNeg = 0x4, 

FeatureHBA = 0x8, 
FeatureHBD = 0x10 
} FeatureType; 

15 static FeatureType fMasks[4] = { FeaturePos, FeatureNeg, FeatureHBA, FeatureHBD }; 
typedef struct feature jatdef 

{ 

FeatureType f type; 
2CL int weight; 

int atomicld; /* if non-zero this atomic id must be present, Nitrogen and Oxygen are the only 
ones checked for */ 

Si int ringlndicator; /* if non-zero indicator if must be in ring, 1 is must be ring, -1 must not be 

IS in ring */ 
2^ char *sln; 

~£ struct CtConnectionTable *ct; 

m void *pattern; 

!^ } FeaturePattern; 

3CP typedef struct { 
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fpt lo[3], /* corner with lowest values for each axis */ 
hi[3], /* " " hi-est " */ 
stepsize[3]; /* increment between points */ 
U int nstep[3], /* derived as 1 + (hi-lo + epsilon) / stepsize */ 
35 n; /* n = product of nstep[i] */ 

int atomtype; /* SYBYL atom type, for steric energy computation */ 
fpt pt_charge; /* elemental charge at point, for electrostatics */ 
fpt *weight; /* weight[n] is applied in all computations, e.g=l */ 
int avgtype; /* box of 'scale', sphere, sphere x vdw, ...? */ 
40 fpt avgscale; /* scale whose meaning derived from avg_type */ 
int arb, /* arbitrary int for later use */ 

*parb; /* M pointer " " */ 
} l_Box, *l_BoxPtr ; 

45 typedef struct { 

char *filename ; /* name of the region's file (if any) */ 

int njwxes; /* number of boxes which make up the region */ 

int njpoints ; /* number of points in this region altogether */ 
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lBoxPtr boxarray; / box_array[n_regions], each one a Box */ 
int nrefs ; /* number of CURRENT references to this memory */ 
long whenmade; /* creation stamp */ 
} l_ComfaRegion, *l_RegionPtr ; 


typedef struct { 

unsigned int crc; 

char *sln; 

int hitcnt; 
} UniqSln; 


static MTomfaRegion *regions[NO_REGIONS]; 
static int regionUseCnts[NO_REGIONS]; 
static IRegionPtr stdRegion; 
static int minRegion; 
static int minRegion2P; 
static int minRegion3P; 


static int totfrags; 
static int tot_uniq_frags; 
static int compounds; 
static int searchCnt; 

static int t_2compare; 
static int t_3compare; 
static int tfcompare; 
static int ^filtered; 
static int t_featFiltered; 
static int t outside; 
static int t fields; 


static int *g_atomDist; 

static struct CtConnectionTable *g_ct; 


static double def_featureWeights[6] = { 20.0, 200.0, 200.0, 100.0, 100.0 }; 
static double featureWeights[6] = { 20.0, 200.0, 200.0, 100.0, 100.0 }; 


/* Local prototypes */ 


struct topograph *TOPJNIT_GRAPH( struct top_graph *g, struct CtConnectionTable *ct ); 
static void ashow( set_ptr aset ); 

static Split *FindBreakPoints(CtConnectionTable *ct, int minHev, int termflag, int createFrags ); 
static int fmdDirectionalNeighbors(CtConnectionTable *ct, int atomldx, int terminalAtomldx, int 


68 


termldx2 ); 

static double *computeVdwWeights(CtConnectionTable *ct, int atomldx, int terminalAtomldx, double 
reductionFactor, int **r_covered ); 

static int *createAtomMask(CtConnectionTable *ct, int termflag, int *r_hevCount); 
5 static int validBreakPoint(CtConnectionTable *ct, int bondidx, int *atomMask, int minHev, int termflag, 
int **rbl, int **rb2 ); 

static int addSplit2(int bondld, int *bl, int *b2 ); 

static int addSplit3(int atomCnt, int bondl, int bond2, int *bl, int *b2, int *b3, int firstBase, int 

secondBase ); 
10 static void freeSplit(Split *s); 

static void freeSpl it2(spl it2 *s2, int cnt ); 

static void freeSplit3(split3 *s3, int cnt ); 

static void freeFrags(Frag *f, int cnt ); 

static void freeFragCts(Split *S); 
15 static int freeStrMap(Split *S); 

static int atomsOverlap(int atomcnt, int *bl, int *b2); 

static int hevCount(int atomcnt, int *b, int *atomMask, int *r_numAtoms ); 

static int createFrag(int atomCnt, int *atoms, int *atomMask, int checkDup ); 

static Frag *createUniqFrags(int atomCnt, split2 *s2, int nums2, split3 *s3, int nums3, int *atomMask, 
2(L S int *r_numFrags ); 
^ static int getAtomIds(CtConnectionTable *ct, int al, int *r_a2, int *r_a3 ); 

static double fieldHexDiff( char *cptr, char *cqtr, int nosq ); 

static double CompareAHFeatures(Split *query, Split *str, double radius ); 
J5 static double CompareTwoCompounds(Split *query, Split *str, double radius, int *r_qidx, int *r_sidx, 
2§f int *r_splitidx, int *r_three, int *r_subsetHit, double *best2, double *best3, double *bestSub, double 
£ * r _ at P> tot bailedout ); 

g char *CT_FIELD2HEX( double *field, int size ); 
1~ static char *hexStringToInts(char *cptr, int *r_size); 
q static double fieldIntDiff( char *cptr, char *cqtr, int si, int s2 ); 
3§E static double topFieldDiff(double *qry, double *str, int npoints ); 

q static double topFieldCompressedDiff(double *qry, double *str, int npoints, double startPenalty ); 
flj static double fieldIntDiffSq( unsigned short *cptr, unsigned short *cqtr, int si, int s2); 
p static double *computePadiWeights(struct CtConnectionTable *ct, int baseAtom, int *atomDist, int 
M *featureMask, int *ctMap ); 

35 static int getFromAtom(struct CtConnectionTable *ct, int *atomdist, double *molWeights, int atom, int 
toAtom, int baseAtom, double *cord ); 

static int debugHits( FILE *fjp, Split *query, Split *str, int bestq, int bestStr, int bestldx, int threeMatched 

); 

static int topAlignCt(struct CtConnectionTable *ct, int baseAtom, int *featureMask, int *ctMapping ); 
40 static int traverseBranch( struct CtConnectionTable *ct, int atomld, int *atomdist, double *molweight, 
int rootToAtom, int *r_toatom, int *r_length, double *r_weight ); 

static int *findLargestBranch(struct CtConnectionTable *ct, int *atomdist, double *weights ); 
static CtBond *getBond(struct CtConnectionTable *ct, int idl, int id2 ); 
static int setTorsion(double *coo, int nAtoms, int al, int a2, int a3, int a4, double value ); 
45 static int reflectAtoms( double *coo, int nAtoms, int npt, int *aplane ); 

static int setBaseTorsion(double *coo, int nAtoms, int a3, int a4, double value ); 

static int setRootTorsion(double *coo, int nAtoms, int a2, int a3, int a4, double value ); 

static int get_details( top_result *res, Split *query, Split *str, int bestq, int bestStr, int bestldx, int 
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threeMatched, int subsetHit, int keepCts ); 

static topresult *top_compare(struct CtConnectionTable *ct, double radius, int details, int idx, int 
keepCts ); 

static struct CtConnectionTable *makeFragCopy(struct CtConnectionTable *ct, int id, int hexdiff ); 
5 static void writeCopy(FILE *fp, struct CtConnectionTable *ct, int id, int hexdiff, char *fieldname ); 
static void setAttr(struct CtConnectionTable *ct, char *name, char *value ); 
static double computeAttachmentPenalty( Frag *qry, Frag *str, Frag *other_qry, Frag *other_str ); 
static FeaturePattern *InitFeaturePatterns(int *rjiumPatterns); 
static int SearchForFeatures(Split *S); 
10 static int computeCentroid( double *cords, int *atoms, int numAtoms, double *r_x, double *r_y, double 
r_z); 

static void addCentroid(Frag *fptr, int natoms, double attrFact, double x, double y, double z ); 

static double compareFeatures(Split *qs, Frag *qry, Split *ss, Frag *str, int query2nd Attach, int 

str2ndAttach ); 

15 static double featureScaling(int *featureCnts, int *extraFeatureCnts, double *featureContributions, int 
nbest ); 

static int BuildTopomers(CtConnectionTable *ct, Split *S, Split *query); 
static int BuildFrags(Split *S); 

static int atomsOutside(double *coords, int natoms, l_RegionPtr regp, double *atwts, double *r_outpen 
2 fe ): 

static int makeTopRegions(double stepSize, int numFrags); 
Jj static l_RegionPtr getRegionToUse(double *coords, int natoms, int *r_idx, int *n_points ); 
SI static void getQueryExtents(double *coords, int atomCnt ); 

= ~ static int getCordExtents(double *coords, int natoms, double *r_minx, double *r_miny, double *r_minz, 

double *r_maxx, double *rjnaxy, double *r_maxz ); 
J static double *compressField(double *fptr, int npoints ); 
S static int compareFields(double *orig, double *atombased, int npoints ); 
T™ static void stripCharge(struct CtConnectionTable *ct, CtAtom *aptr, int atomidx); 
f*j static int dupCheckCore(struct CtConnectionTable *ct, int *r_uniqid, int *r_hitid ); 
3CF struct CtConnectionTable *getLargestFrag(struct CtConnectionTable *ct ); 
q static void CoverConnectedHB(Split *qs, struct CtConnectionTable *ct, double *HB); 
pj static int double_compare(const void *vnrec, const void *vtrec ); 

p static double MeasureClosest(Split *qs, Frag *ql, Split *str, Frag *fl, double *da, double *aa, int 
u *r_nofeatures); 

35 static void PartialMatchFeatures(Split *qs, int mode, Frag *ql, Frag *q2, Frag *q3, Frag *q4, Split *str, 
Frag *fl, Frag *f2, Frag *f3, Frag *f4, int matchCnt ); 

static int makeSplit3(CtConnectionTable *ct, int *atomMask, split2 *sall, int cnt, int minHev ); 
static int getFromChiralAtoms(struct CtConnectionTable *ct, int *atomdist, double *molw, int atom, int 
toAtom, int *r_fromAtom, int *r_toatom); 
40 static int getFromRingCount(struct CtConnectionTable *ct, int *atomdist, int atom, int toAtom ); 
static double get_path_mw( set_ptr aset, struct CtConnectionTable *ct, double mw ); 


static split2 *g_split2; 
45 static int g splitcnt; 
static int g_splitalloc; 

static split3 *g_split3; 
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static int g_split3Cnt; 
static int g_split3Alloc; 

static Frag *g_ftagHead; 
static int gfragCnt; 
static int gfragAlloc; 

static char *regid; 


/* Query options */ 

static struct CtConnectionTable *q_ct; 

static double q_bai!out; 

static FeatureSetName q^featureSet; 

static int q_useFeatureCharges; 

static double q_attachPenFactor = 100.0; 

static double q_featureFactor = 1 .0; 

static double q_extraFeatureFactor = 0.1; 

static int qjninatoms; /* minimum HEV atoms per fragment */ 

static int q_autoScale; /* automatic scaling of sensativity of neighbors based upon the query. */ 

static int q_partialMatch; /* partial match count for HBA and HBD */ 

static double autoScaleFactor; /* steric auto scaling factor */ 

static int q_termFlag; /* if TRUE term atoms are counted */ 

static int q_do2piece; /* if TRUE do 2 piece comparisons */ 

static int q_do3piece; /* if TRUE do 3 piece comparisons */ 

static int q_doSubset; /* if TRUE do subset comparisons, 2 piece query with 3 piece structure. Hit larger 
compounds */ 

static int qjninSubsetSize = 15; 
static int qjnatrixMode; 
static int q_coremode; 
static int q_coremode_aIign; 

static int q_fallback; /* if TRUE fallback on minatoms to 3 and count terminal atoms */ 
static int q_hevDiff; /* maximum allowed hev atoms, inclusive */ 
static int q_filter; /* if TRUE filtering is enabled */ 
static int q_regionMode; 
static double q_stepSize = 2.0; 

static double qJReductionFactor = 0.85; /* reduction factor */ 
static int qjiebugLevel; 
static FILE *q_debugfp; 
static FILE *debug2; 

static Split *qs; /* query split structure & topomers */ 

static int qmode; 

#if0 

int top_test_debug(char *fhame) 

{ 

if ( debug_fp ) 
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fclose(debug_fp); 
debug_fo = (FILE *) 0; 
if ( fhame ) 

debug Jp = fopen(ftaame, M w"); 
return 0; 

} 

#endif 


int TOP_QUERY_OPTIONS(struct CtConnectionTable *ct, 

int do2piece, int do3piece, int doSubset, int minatoms, int autoScaleSteric, int partialMatch, 

int terminalFlag, int fallbackFlag, int hevDiff, int filterFlag, 

double reductionFactor, double featureFactor, double attachmentFactor, 

double stepSize, FeatureSetNamefeatureSet, intuseFeatureCharges, double *feat_weights, double 

extraPenalty, 

FILE *debug_^p, int debugLevel ) 


{ 


int i; 

double *cord; 
double *wptr; 
int numSplits; 

if (ct && ! DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &cord, &i)) 
{ 

UTLERRORCLE ARO ; 
return -1; 

} 

UTL_ERROR_CLEAR0; 

if ( feat weights ) 

. wptr = feat_weights; 

else 

wptr = deffeatureWeights; 

if ( useFeatureCharges ) 

def_featureWeights[l] = def_featureWeights[2] = 0.0; 

else 

deffeatureWeightsfl] = def_featureWeights[2] = 200.0; 

for ( i = 0; i < 5; i++ ) 

featureWeights[i] = wptr[i]; 


qmode = 1; 
if (ct) 

{ 

DB_CT_NORM_AROM(ct); 
DB_CT_STANDARD(ct, (int *) 0); 
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DB_CT_UTL_FIND_RINGS(ct); 

} 

numSplits = 8; 
5 if ( minatoms < -1 ) 

{ 

fallbackFlag = numSplits = minatoms * -1; 
minatoms = ct- > atomCount / 2; 

} 

10 q_featureSet = featureSet; 

q_useFeatureCharges = useFeatureCharges; 
q_extraFeatureFactor = extraPenalty; 
q_minatoms = minatoms; 
q_autoScale = autoScaleSteric; 
15 if ( (LautoScale < 0 ) 

q_autoScale = 0; 
if ( q_autoScale && q_autoScale < 20 ) 

q_autoScale = 20; 
qj>artial Match = partial Match; 
20L q_termFlag = terminalFlag; 

2 q_do2piece = do2piece; 

i; q_do3piece = do3piece; 

q_doSubset = doSubset; 
j» q_fallback = fallbackFlag; 

25p qjilter = filterFlag; 

J q_debugfp = debug_fjp; 

m qjlebugLevel = debugLevel; 

" q_heyDiff = hevDiff; 

pi q_ReductionFactor = reductionFactor; 

3C£ q_featureFactor = featureFactor; 

□ q_attachPenFactor = attachmentFactor * attachmentFactor; /* square what is passed in */ 

ry q_stepSize = stepSize; 

£ if(ct) 
35 { - 

fprintf(stderr, "Initializing query. . . \n"); 
qs = FindBreakPoints(ct, minatoms, terminalFlag, TRUE ); 
i = minatoms; 
if ( terminalFlag = = 0 ) 
40 i~; 

if ( q_fallback > 1 ) 
{ 

while ( (!qs 1 1 qs->s2cnt < q_fallback ) && i > = 3) 
{ 

45 if ( qs ) 

freeSplit(qs); 
qs = FindBreakPoints(ct, i, 1, TRUE ); 
q_minatoms = i; 
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#ifdef TRIPOS_VERSION 

if(qs) 

fprintf(stderr," Minatoms: %d number of fragments: %d 

2piece:%d 3piece: %d\n", 

i, qs- > numFrags, qs->s2cnt, qs->s3cnt ); 

#endif 


} 

10 else 

{ 

if ( !qs || qs-> numFrags = = 0 ) 

fallbackFlag = 1; 
while ( (!qs 1 1 qs-> numFrags == 0) && i > = 3) 
15 { 

if(qs) 

freeSplit(qs); 
qs = FindBreakPoints(ct, i, 1, TRUE ); 

2(L } 

} 

#ifdef TRIPOS_VERSION 
^ if ( q_minatoms ! = minatoms ) 

| J fprintf(stderr, "running the query with a minimum heavy atom count of %d vs 

25{J %d\n", qjninatoms, minatoms ); 
^ #endif 

£ if (qs) 

T ( 

n qs->ct = ct; 

30| SearchForFeatures(qs); 

q BuildFrags(qs); 

m BuildTopomers(ct, qs, (Split *) 0); 

to, Q>rintf(stderr," query initialized. \n"); 

35 qmode = 0; 

if ( qs && qs- > numFrags > 0 ) 

{ 

/* 25 is just a guess as of right now, 1/19/01 . Need to evaluate. 
Small structures are hitting too many compounds. So we 
40 need to make the steric and features more sensative 

large structures are not hitting enough structures so make 

less sensative. 

example values: 12 hev atoms 25.0 / 12.0 -=2.1 
45 increases the steric contribution by a little bit more than twice as much. 

50 hev atoms 25.0 / 50.0 

= 0.5 would decrease the steric contribution by half. This may be too much 

75 hev atoms 25.0 / 75.0 
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= 0.33 would decrease the steric contribution by 1/3. again maybe too much. 

*/ 

if ( q_autoScale ) 

{ 

5 autoScaleFactor = (double) q_autoScale / (double) qs->numHev ; /* 

based upon average drug like structure containing 25 heavy atoms */ 

if ( autoScaleFactor < 1 .0 ) 

autoScaleFactor = (2.0 + autoScaleFactor) / 3.0; 
fprintf(stderr,"Auto steric scaling factor : %8.21f\n", autoScaleFactor ); 

10 } 

else 

autoScaleFactor = 1.0; 
return 0; /* everything is just fine, found some fragments */ 

} 

15 if ( qs ) 

{ 

freeSplit(qs); 
qs = (Split *) 0; 

} 

^ qmode = 0; 

Si return -2; /* failed */ 

I } 

"E void TOP_QUERY_DUMP(FILE *fp, char *id_fieldname ) 

m i 

j=s Frag *f; 

p if ( !fp || !id_fieldname || !qs ) 

fU return; 

tl if(qs->ct) 

35 DB_CT_WRITE(fp, qs->ct); 

for ( i = 0; i < qs- > numFrags; i+ + ) 
{ 

f = qs-> frags + i; 
if (f->ct) 

40 writeCopy(^),f->ct, i, -1, id_fieldname ); 

} 

} 

top result *TOP_COMPARE_WDETAIL( struct CtConnectionTable *ct, double radius, int idx, int 
45 keepCts ) 
{ 

top result *res; 
top result *rescopy; 
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if ( radius < = 0.0 ) 

radius = 99999.9; 

res = top_compare(ct, radius, 1, idx, keepCts ); 
5 if ( res && res- > comfadiff < = radius ) 

{ 

res copy = (topresult *) malloc(sizeof(top_result) ); 
memcpy((char *) rescopy, (char *) res, sizeof(top_result) ); 
return rescopy; 

10 } 

else if ( res ) 

{ 

TOP_FREE_RESULT(res, 0 ); 

} 

15 return (top_result *) 0; 

} 

/* 

Compare the ct structure with 3D coordinates with 

20_ the ct specified to TOP_QUERY_OPTIONS, 

^ returns the topomeric difference or a negative value upon 

rf failure or being filtered out. 

jfj returns the filtered status through the filtered pointer. 

The input radius is passed in for filtering reasons 

S *' 

'f* double TOP_COMPARE(struct CtConnectionTable *ct, double radius, int *filtered, int idx ) 

30C top result *res; 

E double comfa diff; 

f| UTL_ERROR_CLEAR0; 
^ *filtered = 0; 

35 if ( radius < = 0.0 ) 

radius = 99999.9; 
res = top_compare(ct, radius, 0, idx, 0 ); 
if ( res ) 
{ 

40 comfa_diff = res- > comfa_diff; 

TOP_FREE_RESULT(res,0); 
return comfa_diff; 

} 

return -1.0; 


45 } 


static top_result *top_compare(struct CtConnectionTable *ct, double radius, int details, int idx, int 
keepCts ) 
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static topresult ts[l]; 
int i; 
Split s; 
double *cord; 

double comfa_diff, best2, best3; 

int qidx, sidx, splitidx, splitlnThree, subsetHit; 

int bailedout; 

int strmin; 

static int envminSubsetSize = -1; 
#ifdef HEVSTATS 

static FILE *bfp; 
char *regid; 

#endif 

UTLERROR CLEARO; 

if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &cord, &i)) 

return (topresult *) 0; 
DB_CT_UTL_FIND_RINGS (ct) ; 

if (q_fallback > 1 ) 
{ 

i = strmin = ct->atomCount / 2; 
s = FindBreakPoints(ct, i, qjermFlag, TRUE ); 
if ( qjermFlag ) 
i-S 

while ( (!s f| s->s2cnt < q_fallback ) && i ) 
{ 

if(s) 

freeSplit(s); 
strmin = i; 

i-S 

s = FindBreakPoints(ct, i, 1, TRUE ); 

} 

#if 0 

fprintf(stderr, "structure min atoms: %d\n", strmin ); 

#endif 

} 

else 

{ 

searchCnt+-h; 

s = FindBreakPoints(ct, q_minatoms, q_termFlag, TRUE ); 
if(!s) 

return (top result *) 0; 
i = q_minatoms; 
if ( qjermFlag ) 
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while ( s && s-> numFrags = = 0 && i && q_fallback ) 
{ 

freeSplit(s); 

s = FindBreakPoints(ct, i, 1, TRUE ); 
5 i-; 

} 

} 

if ( !s 1 1 !s->s2cnt) 
{ 

10 if ( s ) 

freeSplit(s); 
return (top result *) 0; 

} 

15 if ( envminSubsetSize = = -1 ) 

{ 

char *tptr; 

tptr = getenv("DBTOP_MIN_HEV"); 
20_ if ( tptr ) 

!rf envminSubsetSize = atoi(tptr); 

if ( env minSubsetSize < 0 ) 
i!f envminSubsetSize = 0; 

2^ } 

else 

2 envminSubsetSize = 0; 

m } 

30% q_minSubsetSize = env minSubsetSize; /* qs->numHev - # some number */ 

% q_bailout = radius * radius; 

Si memset((char *) ts, '\0\ sizeof(top_result) ); 

s->ct = ct; 
J SearchForFeatures(s); 
35" ~ if ( qJeatureFactor > 0.0 ) 

ts->comfa_diff = CompareAHFeatures(qs,s,radius ); 
if ( ts-> comfadiff < = radius ) 

{ 

if ( q_featureFactor > 0.0 ) 
40 BuildTopomers(ct, s, qs); 

else 

BuildTopomers(ct, s, (Split *) 0 ); 
ts->comfa_diff = CompareTwoCompounds(qs, s, radius, &qidx, &sidx, &splitidx, 
AsplitlnThree, AsubsetHit, 

45 &(ts->best2), &(ts- > best3), &(ts- > bestSub), 

&(ts->attachmentPenalty), bailedout); 

} 

else 
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{ 

t_featFiltered+ + ; 

qidx = -1; /* Indicate no indexing */ 

} 

ts->ct = ct; /* save a pointer to the ct being compared */ 
ts->idx = idx; 

#ifdefHEV_STATS 

regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct,CtCtRegId, &regid ); 
if ( Iregid ) 

DB_CT_GET_CT_ATTR(ct,CtCtName, &regid ); 
if(!bfp) 

bfp = fopen("hev.stats", "w"); 
fprintf(bfp,"%s %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d\n", regid, s->numHev, 
qs->numHev, 

qs->numHev - s->numHev, 
abs(s- > numHev - qs->numHev), 
(int) ts->comfa_diff, (int) ts->best2, (int) ts->best3, 
s->numFrags, s->s2cnt, s->s3cnt); 
if ( !(idx % 100 ) ) 
fflush(bfp); 

#endif 

if ( details && qidx > = 0 ) 
{ 

if ( get_details(ts, qs, s, qidx, sidx, splitidx, spIitlnThree, subsetHit, keepCts ) ) 

{ 

ts->comfa_diff = q_bailout; 

fprintf(stderr, "internal failure, please provide query, options, and structure 

below.\n"); 

if (s->ct) 

DB_CT_WRITE(stderr, s->ct); 

} 

} 

freeSplit(s); 
return ts; 

} 

static double CompareAHFeatures(Split *query, Split *str, double radius ) 

{ 

double best; 

static Split *qfeatlnit; 

static int qFeatures[5]; 

int sFeatures[5]; 
static int tsearched; 
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double best2, best3, bestsub; 

double dl, d2, d3, d4, d5, d6; 

double dval[6]; 

int hevCnts[6]; 

double attPen[2]; 

int bestQ, bestStr; 

int bestldx; 

int threelsBetter = 0; 

int SublsBetter = 0; 

int idl, id2, id3, id4; 

int i,j,k, 1; 

int ids[3]; 

Frag *f, *sf; 

Frag *ql, *q2, *q3, *q4; 

Frag *fsl, *fs2, *fs3, *fs4; 

Frag *fragPtrs[3]; 

Frag *qActive; 

split2 *qs2, *ss2; 

split3 *qs3, *ss3; 

double *dptr; 

double hexdiff; 

int max3; 

static Split *qlnit; 

double bailout; 

static int t_quick; 

int combo2, combo3; 

int nskip2, nskip3; 


memset((char *) sFeatures, '\0\ sizeof(int) * 6 ); 
for ( i = 0; i < str->atomCount; i++ ) 
{ 

if ( str-> featureMask ) 

{ 

if ( str- > featureMaskfi] & FeaturePos ) 

sFeatures[l] +=1; 
if ( str- > featureMask[i] & FeatureNeg ) 

sFeatures[2] +=1; 
if ( str- > featureMask[i] & FeatureHBA ) 

sFeatures[3] += 1; 
if ( str->featureMask[i] & FeatureHBD ) 

sFeatures[4] +- 1; 

} 

} 

sFeatures[0] = str- > numArom; 
if ( qfeatlnit ! = query ) 

{ 
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memset((char *) qFeatures, '\0', sizeof(int) * 6 ); 
for ( i = 0; i < query- > atomCount; i++ ) 
{ 

if ( query- > featureMask ) 

{ 

if ( query- >featureMask[i] & FeaturePos ) 

qFeatures[l] +=1; 
if ( query- >featureMask[i] & FeatureNeg ) 

qFeatures[2] += 1; 
if ( query- >featureMask[i] & FeatureHBA ) 

qFeatures[3] +=1; 
if ( query- >featureMask[i] & FeatureHBD ) 

qFeatures[4] += 1; 

} 

} 

qFeatures[0] = query- >numArom; 
qfeatlnit — query; 

fyrintf(stderr," Query feature counts Arom: %d Pos & Neg: %d & %d HBA & 
HBD: %d & %d \n", 

qFeatures[0], qFeatures[l], qFeatures[2], qFeatures[3], qFeatures[4] ); 

} 

#ifO 

fprintf(stderr, "structure feature counts Arom: %d Pos & Neg: %d & %d HBA & HBD: 
%d & %d \n M , 

sFeatures[0], sFeatures[l], sFeatures[2], sFeatures[3], sFeatures[4] ); 

#endif 

tsearched+ + ; 

if ( q_partialMatch = = 0 ) 

{ 

for ( best = 0.0, i = 0; i < 5; i+ + ) 
{ 

#define SAFEFEATUREQUICK 
#ifdef SAFEFEATUREQUICK 

if ( qFeatures[i] && !sFeatures[i] ) 

best + = featureWeights[i] * featureWeights[i] * (double) ( (qFeatures[i] 

- sFeatures[i]) ); 
#else 

if ( qFeatures[i] > sFeatures[i] ) 

best + = featureWeights[i] * featureWeights[i] * (double) ( (qFeaturesfi] 

- sFeatures[i]) ) * q_ReductionFactor; 
#endif 

} 

if ( best < 0.0 ) 

best = 0.0; 
best = sqrt(best); 
if ( best > radius ) 
{ 
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t_quick+ +; 
return 9999.00; 

} 

} 

BuildFrags(str); /* Postpone building the frags after a quick feature filtering */ 

for ( i = 0, f = query->frags; i < query- >numFrags; i++, f++ ) 

{ 

if ( q_partialMatch ) 

{ 

if (f->feature2PDiff ) 

free((char *) f- > feature2PDiff); 
if (f->feature3PDiff) 

free((char *) f->feature3PDiff); 
if ( f-> featureSubsetDiff ) 

free((char *) f-> featureSubsetDiff); 
f- > feature2PDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
f->feature3PDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
f-> featureSubsetDiff = (double *) calloc(str- > numFrags,sizeof(double) ); 
for ( j = 0; j < str- > numFrags; j + + ) 
{ 

f->feature2PDiff[j] = -1.0; 

f->feature3PDiff[j] = -1.0; 
f->featureSubsetDifflj] = -1.0; 

} 

f->featureDiff = f->feature2PDiff; 

} 

else 

{ 

if (f->featureDiff ) 

free((char *) f->featureDiff); 
f->featureDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
for(j = 0;j < str- > numFrags; j+ + ) 
{ 

f->featureDiffU] = -1.0; 

} 

} 

} 

best = 9999.0 * 9999.0; 
bailout = radius * radius; 
best3 = best2 = bestsub = best; 

combo2 = combo3 = nskip2 == nskip3 = 0; 
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2 piece feature comparisons 
*/ 

if ( query- > s2 && str-> s2 && q_do2piece ) 
{ 

for ( i = 0, qs2 = query->s2; i < query->s2cnt ; i+ + , qs2+-f ) 

{ 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
#ifhdef NOSTRMAP 

if ( !qs2-> strMap | | str->s2cnt > query- > alloc2Map ) 

{ 

if ( qs2- > strMap && query- > alloc2Map ) 

free(qs2- > strMap); 
if (str->s2cnt > 0) 

qs2-> strMap = (int *) calloc(str->s2cnt, sizeof(int) ); 

else 

qs2-> strMap = (int *) 0; 

} 

#endif 

if (qs2-> tragi == -1 1 1 qs2->frag2 == -1) 
continue; 

for (j = 0, ss2 = str->s2; j < str- > s2cnt; j + + , ss2++ ) 
{ 

#ifndef NO_STRMAP 

qs2->strMap[j] = 0; 
combo2 + +; 


#endif 


if (ss2-> tragi == -1 j| ss2->frag2 == -1) 
continue; 

fsl = str- > frags + ss2-> tragi; 
fs2 = str- > frags + ss2->frag2; 
idl = fsl->id; 
id2 = fs2->id; 


if ( qjjartialMatch ) 
{ 

PartialMatchFeatures(query, 2, ql, q2, (Frag *) 0, (Frag *) 0, str, fsl, 
fs2, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 2, ql, q2, (Frag *) 0, (Frag *) 0, str, fs2, 
fsl, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

) 

else 

{ 

if ( ql->featureDiff[idl] = = -1.0 ) 

ql->featureDiff[idl] = compareFeatures( query, ql, str, fsl, -1, 
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-i ); 
-i ); 

5 

-i ); 
10 -l ); 
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if ( ql->featureDiff[id2] = = -1.0 ) 

ql->featureDiff[id2] = compareFeatures( query, ql, str, fs2, -1, 

if ( q2->featureDiff[idl] = = -1.0 ) 

q2->featureDiff[idl] = compareFeatures( query, q2, str, fsl, -1, 

if ( q2->featureDiff[id2] = = -1.0 ) 

q2- > featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1 , 


} 


dl = ql->featureDiff[idl] + q2->featureDiff[id2]; 
if ( dl < best ) 
15 { 

bestQ = i; 
bestStr = j; 
best = best2 = dl; 
bestldx = 0; 

20o } 
J d2 = ql->featureDiff[id2] + q2->featureDiff[idl]; 

m if ( d2 < best ) 

yi bestQ = i; 

25j bestStr = j; 

H best = best2 = d2; 

ffl bestldx = 1; 

} 

O #ifndef NO_STRMAP 
30p if ( dl < = q_bailout 1 1 d2 < q_b ai,out ) 

fy qs2->strMap[j] = 1; 

13 nskip2 + 4* ; 

^ } 

35 #endif 

} 

} 

if ( str->s2cnt > query- > alloc2Map ) 

query- >alloc2Map = str->s2cnt; 

40 } 

/* 

3 piece feature comparisons 
*/ 

for ( i = 0, qs3 = query- >s3; q_do3piece && qs3 && i < query- >s3cnt; i+ + , qs3 + + ) 
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{ 

ql = query- > frags + qs3->fragl; 
q2 = query- > frags + qs3->frag2; 
q3 = query- > frags + qs3->frag3; 
q4 = query- > frags + qs3->frag4; 
#ifhdef NOSTRMAP 

if ( !qs3->strMap 1 1 str->s3cnt > query- >alloc3Map ) 

{ 

if ( qs3->strMap && query- >alloc3 Map ) 

free((char *) qs3->strMap); 
if (str->s3cnt > 0) 

qs3->strMap = (int *) calloc(str- > s3cnt, sizeof(int) ); 

else 

qs3->strMap = (int *) 0; 

} 

if (qs3->fragl == -1 1 1 qs3->frag2 == -1 1 1 qs3->frag3 == -1 ) 
continue; 

#endif 

for (j = 0, ss3 = str->s3; ss3 && j < str->s3cnt; j+ + , ss3++ ) 
{ 

fifhdef NO_STRMAP 

qs3->strMap[j] = 0; 
combo3+ + ; 


#endif 


if (ss3->fragl == -1 1 1 ss3->frag2 == -1 | | ss3->frag3 == -1 ) 

continue; 
fsl = str-> frags + ss3-> tragi; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str-> frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3->id; 
id4 = fs4->id; 


q_partialMatch ); 
q_partial Match ); 


-i ); 


if ( q_partialMatch ) 

{ 

PartialMatchFeatures(query, 3, ql, q2, q3, q4, str, fsl, fs2, fs3, fs4, 


} 

else 

{ 


PartialMatchFeatures(query, 3, ql, q2, q3, q4, str, fs4, fs3, fs2, fsl, 


if (ql->featureDiff[idl] == -1.0) 

ql->featureDiff[idl] = compareFeatures( query, ql, str, fs 1,-1, 
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-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


if ( ql->featureDiff[id4] = = -1.0 ) 

ql-> featureDiff[id4] = compareFeatures( query, ql, str, fs4, -1, 


if ( q4->featureDiff[idl] = = -1.0 ) 

q4-> featureDiff[idl] = compareFeatures( query, q4, str, fsl, -1, 


if ( q4->featureDiff[id4] = = -1.0 ) 

q4- > featureDiff[id4] = compareFeatures( query, q4, str, fs4, -1 , 


if (q2->featureDiff[id2] == -1.0) 

q2-> featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1, 


if ( q2->featureDiff[id3] = = -1.0 ) 

q2- > featureDiff[id3] = compareFeatures( query, q2, str, fs3, -1 , 


if (q3->featureDiff[id3] == -1.0) 

q3-> featureDiff[id3] = compareFeatures( query, q3, str, fs3, -1, 


if ( q3->featureDiff[id2] == -1.0 ) 

q3- > featureDiff[id2] = compareFeatures( query, q3, str, fs2, -1 , 


} 


attPen[0] = attPen[l] = 0.0; 
dval[0] = 0.0; 
dval[l] = 0.0; 

if ( q_attachPenFactor > 0.0 ) 
{ 

attPen[0] = ( computeAttachmentPenalty( ql, fsl, q4, fs4 ) + 
computeAttachmentPenalty(q4, fs4, ql, fsl) ); 

attPen[l] = ( computeAttachmentPenalty( ql, fs4, q4, fsl ) + 
computeAttachmentPenalty(q4, fsl, ql, fs4) ); 

dval[0] + = attPen[0]; 
dval[l] += attPenflj; 

} 

if ( q_featureFactor > 0.0 ) 

{ 

dval[0] += ( ql->featureDiff[idl] + q4- > featureDiff[id4] ) / 2.0 + 
q2->featureDiff[id2] + q3->featureDiff[id3]; 

dval[l] + = ( ql->featureDiff[id4] + q4->featureDiff{idl] ) / 2.0 + 
q2->featureDiff[id3] + q3->featureDiff[id2]; 
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10 


15 


} 

max3 = 2; 

for ( k = 0; k < max3; k+ + ) 
{ 

if ( dval[k] < best ) 
{ 

best = best3 = dval[k]; 
bestQ = i; 
bestS tr = j; 
bestldx = k; 
threelsBetter = 1; 

} 

else if ( dval[k] < best3 ) 
best3 = dval[k]; 


20^ 


#ifndef NO STRMAP 


if ( dval[k] < = q_bailout && qs3->strMap[j] = = 0 ) 
{ 

qs3->strMap[j] = 1; 
nskip3+ + ; 


3Qfi 


#endif 


} 

} 

if ( str->s3cnt > query- > al!oc3Map ) 

query- >alloc3Map = str->s3cnt; 


35 


40 


45 


subset feature comparisons 

Compare the query 2 piece fragmentation with 3 piece structure fragmentation. Match A-B in query 
with A-B or B-C in structure, where 
B is the center piece of the structure. 

For comparing two piece with 3 piece. Frag 1 & 2 are a set, while fragment 3 and 4 are a set, in that 
the 

attacment bond that is broken defines the connection between fragl and frag2. Frag3 and frag4 are the 
second split. Fragl and frag4 are 

the center/core fragments. Aligned from the different starting attachment atom. 
*/ 


if ( query- >s2 && str->s3 && q_doSubset ) 
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/* loop over query 2 piece fragments, and compare with structure 3 piece 

fragments. */ 

for ( i = 0, qs2 = query->s2; i < query->s2cnt ; i+ + , qs2+ + ) 
{ 

if (qs2->fragl == -1 | | qs2->frag2 == -1) 

continue; 
ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
#ifhdef NOSTRMAP 

if ( !qs2->subsetMap 1 1 str->s3cnt > query- > allocSubsetMap ) 

{ 

if ( qs2->subsetMap && query- > allocSubsetMap ) 

free(qs2- > subsetMap); 
if (str->s3cnt > 0) 

qs2-> subsetMap = (int *) calloc(str- > s3cnt, sizeof(int) ); 

else 

qs2-> subsetMap = (int *) 0; 

} 

#endif 

for (j = 0, ss3 = str->s3; ss3 && j < str->s3cnt; j+ + , ss3 + + ) 
{ 

if (ss3->fragl == -1 || ss3->frag2 == -1 1 1 ss3->frag3 == -1 ) 
continue; 

#ifhdef NO_STRMAP 

qs2- > subsetMap[j] = 0; 

#endif 

fsl = str-> frags + ss3->fragl; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str-> frags + ss3->frag4; 
idl = fsl->id; 
id2 = fs2->id; 
id3 = fs3->id; 
id4 = fs4->id; 


if ( q_partialMatch ) 
{ 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fsl, 
fs2, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs2, 
fsl, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs3, 
fs4, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs4, 
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, (Frag *) 0, q_partialMatch ); 

} 

else 

{ 

if (ql->featureDiff[idl] == -1.0) 

ql->featureDiff[idl] = compareFeatures( query, ql, str, fsl, -1, 

if (ql->featureDiff[id2] == -1.0) 

ql- > featureDiff[id2] = compareFeatures( query, ql , str, fs2, -1 , 


if ( q2- > featureDiff[idl] = = -1 .0 ) 

q2->featureDiff[idl] = compareFeatures( query, q2, str, fsl, -1, 

if ( q2- > featureDiff[id2] = = -1 .0 ) 

q2->featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1, 


if (ql->featureDiff[id3] == -1.0) 

ql- > featureDiff[id3] = compareFeatures( query, ql, str, fs3, -1 , 

if (ql->featureDiff[id4] == -1.0) 

ql- > featureDiff[id4] = compareFeatures( query, ql , str, fs4, -1 , 


if ( q2->featureDiff[id3] = = -1.0 ) 

q2- > featureDiff[id3] = compareFeatures( query, q2, str, fs3, -1, 

if ( q2->featureDiff[id4] = = -1.0 ) 

q2- > featureDiff[id4] = compareFeatures( query, q2, str, fs4, -1, 

} 

if ( q_featureFactor > 0.0 ) 

{ 

dval[0] = ql->featureDiff[idl] + q2->featureDifflid2]; 

dval[l] = ql->featureDiff[id2] + q2->featureDiff[idl]; 

dval[2] = ql->featureDiff[id3] + q2->featureDiff[id4]; 

dval[3] = ql->featureDiff[id4] + q2->featureDiff[id3]; 

} 

else 

dval[0] = dval[l] = dval[2] = dval[3] = 0.0; 

hevCnts[0] = hevCnts[l] = fsl->hevCnt + fs2->hevCnt; 
hevCnts[2] = hevCnts[3] = fs3->hevCnt + fs4->hevCnt; 

max3 = 4; 
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for ( k = 0; k < max3; k+ + ) 
{ 

if ( hevCnts[k] > q_minSubsetSize ) 

{ 

if ( dval[k] < best ) 
{ 

best = bestsub = dval[k]; 
bestQ = i; 
bestStr = j; 
bestldx = k; 
SublsBetter = 1; 

} 

else if ( dval[k] < bestsub ) 

{ 

bestsub = dval[k]; 

} 

} 

if ( dvalfk] < = q_b ailout <p2-> subsetMap[j] = = 0 ) 
{ 

qs2->subsetMap[j] = 1; 

} 

} 

} 

} 

if ( str->s3cnt > query- > allocSubsetMap ) 

query- > allocSubsetMap = str->s3cnt; 
} /* end of subset */ 


if ( best < 0.0 ) 
best = 0.0; 

#if 0 

fprintf(stderr,"%d of %d 2p skipped %d of %d 3p skipped best: %8.41f \n", 

combo2 - nskip2, combo2, combo3 - nskip3, combo3, sqrt(best) ); 

#endif 

return sqrt(best); 

} 


void TOP_FREE_RESULT(top_result *res, int freeRef ) 
{ 

int i; 
if ( !res ) 

return; 

for (i = 0; i < 3; i++ ) 
{ 

if ( res-> strFrags[i] ) 
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DB_CT_DELETE_CT(res-> strFrags[i] ); 

} 

if(freeRef) 

free((char *) res ); 

5 } 

static char tempString[200]; 

10 struct top_graph *TOP_INIT_GRAPH( struct top_graph *g, struct CtConnectionTable *ct ) { 

/ 

= = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* (re) initializes topomer graph info *g for structure *ct */ 

15 

int b, nowats, nowbds, nowmax, ntoats, toats[20], ntoats2, na, nb, bet, inRing; 
struct topgraph *gnew; 
struct bondtoprec *bptr; 

set_ptr end_atoms = NIL, nuls = NIL, cnats = NIL, nxcn=NIL, a2chk=NIL, 
20n TOP_CONN_ATOMS0; 
5 CtBondTypeDef bType; 

m if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &nowats ) 1 1 !DB_CT_GET_CT_ATTR( ct, 

iff CtCtBondCount, &nowbds ) ) 
25|= goto error; 

flj /* be sure rings were perceived */ 

if (!DB_CT_UTL_FIND_RINGS( ct )) goto error; 

30p /* (re)allocate all memory required by this structure, excepting sets of toatts */ 
O if(g){ 
tu /* free all dependent memory */ 

O for (b = 0; b < g->nbonds; b+ +) if (g->bstuff[b]. detail) { 

M if (! UTL SET DESTRO Y( g- > bstuff[b] .detail- > to atts ) ) goto error; 

35 if (!UTL_MEM_FREE( g- > bstuff[b] .detail ) ) goto error; 

g->bstuff[b]. detail = (struct bond_detail_rec *) 0; 

} 

/* if this molecule is bigger, reallocate dependent data arrays */ 
if (nowats > g- > maxatoms) { 
40 nowmax = (nowats > g- > maxatoms * 2 ? nowats : g- > maxatoms * 2 ); 

if (!( g->bstart = (int *) DB CT UTL REALLOC ( 

( char * ) g->bstart, sizeof(int) * nowmax ) ) ) goto error; 
g- > maxatoms = nowmax; 

} 

45 /* note that bonds are 2x more because they are stored rooted from both ends */ 

if (2 * nowbds > g->maxbonds ) { 

nowmax = (2 * nowbds > g- > maxbonds ? 2 * nowbds : g- > maxbonds ); 
if (!( g->bstuff = (struct bond_top_rec *) DB_CT_UTL_REALLOC ( 
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} 

} 

else { 


( char * ) g-> bstuff, sizeof(struct bond_top_rec) * nowmax ) ) ) goto error; 
g- > maxbonds = nowmax; 

gnew = g; 


if (! (gnew = (struct top_graph *) UTL_MEM_ALLOC( sizeof( struct top_graph ) ) )) 


goto error; 


if (! (gnew -> bstart = (int *) UTL_MEM_ALLOC( sizeof( int ) * 1000 ) )) goto error; 
10 gnew- > maxatoms = 1000; 

if (! (gnew -> bstuff = (struct bondtoprec *) 

UTL_MEM_ALLOC( sizeof(struct bondJop__rec) * 2000 ) )) goto error; 
gnew- > maxbonds = 2000; 

} 

15 gnew- > natoms = nowats; 

gnew->nbonds = nowbds; 


if( 

if(! 

20^ if (! 

E if ( 

m if( 


(a2chk = UTL_SET_CREATE( nowats + 1 ) )) goto error; 
(mils = UTL_SET_CREATE( nowats + 1 ) )) goto error; 
(cnats = UTL_SET_CREATE( nowats + 1 ) )) goto error; 
(nxcn = UTL_SET_CREATE( nowats + 1 ) )) goto error; 
(end atoms = UTL_SET_CREATE( nowats + 1 ) )) goto error; 


I n /* fill in tree information */ 
25p bptr = gnew- > bstuff; 

I bet = 0; 

m for (na = 1; na < = nowats; na+ + ) { 

I" if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, na, CtAtomBondCount, &ntoats ) )) goto 

Q error; 
3Qj if (ntoats > 20) { 


35 error; 


fprintf( stderr, "More than 20 bonds to atom %d.\n", na ); 
goto error; 

} 

if (!(DB_CT_GET ANY_ATOM_ATTR( ct, na, CtAtomBondTo Atoms, &toats ) )) goto 


gnew->bstart[na - 1] = bet; 

for (nb = 0; nb < ntoats; nb+ + , bct+ + , bptr+ + ) { 
bptr- > from = na; 
bptr- > to = toats[ nb ] ; 
40 /* is this a topomerically labile bond? */ 

if (!(b = DB_CT_UTL_GET_BONDID( ct, na, bptr- > to ) )) goto error; 
if (!DB_CT_GET_BOND_ATTR( ct, b, CtBondlsInRing, AinRing) 

1 1 !DB_CT_GET_BOND_ATTR( ct, b, CtBondType, &bType ) 
I | ! DB CT GET AN Y_ATOM_ATTR( ct, toats[ nb ], 
45 CtAtomBondCount, &ntoats2 ) ) goto error; 

if (linRing && bType = = CtBondTypeSingle &&. ntoats > 1 && ntoats2 > 

/* 
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if (!(bptr- > to_atts = TOP_CONN_ATOMS( ct, bptr- > to, bptr- > from, 
mils, cnats, nxcn, end_atoms ) )) goto error; 

*/ 

if (!(TOP_MARK_BEST( ct, bptr- > to, bptr- > from, TRUE, bptr, NIL, 

5 NIL, NIL, 

a2chk, mils, cnats, nxcn, end_atoms ) )) goto error; 

} 

else bptr- > detail = (struct bond_detail_rec *) 0; 

} 

10 } 

if(end_atoms) UTL_SET_DESTROY(end_atoms); 

if(nuls) UTL_SET_DESTROY(nuls); 

if(nxcn) UTL SET DESTROY(nxcn); 

if(cnats) UTL_SET_DESTROY(cnats); 
15 /* if(a2chk) UTL_SET_DESTROY(a2chk); jilek (to do) was cnats */ 

return gnew ; 

error: 

return (struct topgraph *) 0; 

} 

J set_ptr TOP_CONN_ATOMS( 

St™ / * 


yl = = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

25^ /* returns the set of all atoms in *ct which are attached to atoml, 
£ except that any path ending in atom2 is truncated. 

Qj The returned set is created here (to be freed by user when finished) 

a For efficiency in reprocessing the same structure, 

O four working sets are supplied by caller */ 

O struct CtConnectionTable *ct, 

fy int atoml, 

Q int atom2, 

N= set_ptr nuls, set_ptr cnats, setjrtr nxcn, setj)tr endatoms ) 

35 { 

int natot, ntoats, toats[20], natt, nats, elem, nuats; 
set_ptr a2chk=NIL; 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
40 UTL_SET_CLEAR(end_atoms); 

UTL_SET_INSERT( end_atoms, atom2 ); 

if (!(a2chk = UTL_SET_CREATE( natot + 1 ) )) goto error; 
/* root at first set of attached atoms */ 
45 if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, atoml, CtAtomBondCount, &ntoats) )) goto error; 

if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, atoml, CtAtomBondTo Atoms, &toats ) )) goto error; 
for (natt=0; natt< ntoats; natt++) UTL_SET_INSERT( a2chk, toats[ natt ] ); 
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if (UTL_SET_EMPTY( a2chk )) return( FALSE ); 

UTL_SET_DIFF_INPLACE( a2chk, end_atoms, a2chk ); 
nats = UTL_SET_CARDINALITY( a2chk ); 
5 UTL_SET_COPY_INPLACE( cnats, a2chk ); 

/* breadth first search */ 
while (TRUE) { 
UTL_SET_CLEAR( nxcn ); 
elem = -1; 

10 while ( (elem = UTL_SET_NEXT( cnats, elem)) > = 0 ) { 

UTL_SET_CLEAR( nuls ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats ) )) goto error; 
if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondTo Atoms, &toats ) )) goto error; 
15 for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( nuls, toats[ natt ] ); 

UTL_SET_DELETE( nuls, atoml ); 
UTL_SET_DIFF_INPLACE( nuls, end_atoms, nuls ); 
UTL_SET_OR_INPLACE( nxcn, nuls, nxcn ); 
UTL_SET_DIFF_INPLACE( nxcn, a2chk, nxcn ); 

20q } 

,n UTL_SET_OR_INPLACE( a2chk, nxcn, a2chk ); 

m nuats = UTL_SET CARDINALITY( a2chk ); 

ry if (nuats < = nats) break; 

LP nats = nuats; 

25_p UTL_SET_COPY_INPLACE( cnats, nxcn ); 

ffl return a2chk; 

s error: 

O return (set_ptr) NIL; 

3oi 

p toomanyattms: 

fy lprintf( stderr, "More than twenty atoms attached to some atom in this structureAn" ); 
O goto error; 

H } 
35 

int TOP_MARK_BEST( 

/ * 


== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =*/ 

40 /* adds information for prioritizing attachments to an atom */ 
struct CtConnectionTable *ct, 
int al, /* the root atom */ 

int a2, /* the base of the root - skip it */ 

int fiill_data, /* provide information relating to near symmetries? + attached 

45 sets */ 

struct bond_top_rec *bptr, /* output here if full_data=TRUE */ 

int *only_atoms, /* output here if full_data= FALSE */ 

double *coo_in, /* atomic coords (retrieved from ct if not provided */ 
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set_ptr attach3set, /* if provided, a super root atom(s) 

for entire group (highest priority path is shortest to here) */ 
setjrtr a2chk, setjrtr nuls, setjrtr cnats, setjrtr nxcn, set_ptr end_atoms ) 

{ 

# define MAXJSIP 8 

struct pathrec { 
int root, nrings, chosen, nats, done, a3id; 
double mw; 
setjrtr path, nxtls; 

}; 

struct pathrec p[MAX_NP]; 

int retval, toroot, ntoats, toats[20], natt, a, np, growing, nats, natot, ncycles, pnow, ringclosed, 
debug=FALSE; 

int nuats, elem, new_rings, pdone, p2do, best, decision, naout, lastnats = 0, lastdecision, arec2, 

a4; 

double *coo, tl, t2, diff, potl, pot2, podiff, get_path_mwO; 

np = 0; 

if (!(coo = coo_in)) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &coo, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 

if (fulldata) if (!( bptr-> detail = (struct bond detail rec* ) 

UTL_MEM_CALLOC( sizeof( struct bond detail rec ), 1 ) )) goto error; 

toroot = attach3set 1 1 !a2; 
UTL_SET_CLEAR( end atoms ); 
if (a2) UTL_SET_INSERT( end_atoms, a2 ); 
arec2 = a2; 

UTL_SET_CLEAR( a2chk ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondCount, &ntoats) )) goto error; 
if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondToAtoms, &toats ) )) goto error; 
for (natt=0; natt< ntoats; natt++) UTL_SET_INSERT( a2chk, toats[ natt ] ); 
if (a2) UTL_SET_DELETE( a2chk, a2 ); 

/* initialize path records */ 
a = -1; 
np = 0; 

while (np < MAX_NP && (a = UTL_SET_NEXT( a2chk, a)) > = 0 ) { 
if (!(p[np].path = UTL_SET_CREATE( natot + 1 ) )) goto error; 
if (!(p[npj. nxtls = UTL_SET_CREATE( natot + 1 ) )) goto error; 
p[np].root = a; 

p[np]. nrings = p[np].done = p[np].a3id = 0; 
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UTL_SET_INSERT( p[np].path, a ); 
np+ + ; 

} 

5 /* grow the paths */ 
growing = TRUE; 
nats = 0; 
ncycles = 0; 
while (growing ) { 
10 nuats = 0; 

ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow+ + ) if (!p[pnow].done) { 
UTL_SET_COPY_INPLACE( cnats, p[pnow].path ); 
UTL_SET_CLEAR( nxcn ); 
15 elem = -1; 

/* accummilate this generation of attached atoms into nxcn */ 

while ( (elem = UTL SET_NEXT( cnats, elem)) > = 0 ) { 
UTL_SET_CLEAR( nuls ); 
if (!(DBJZTJ3ET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats) )) goto error; 
20Q if (ntoats > 20) goto toomanyattms; 

if (! (DB CT GET AN Y_ATOM_ATTR( ct, elem, CtAtomBondTo Atoms, &toats ) )) goto 

ffl error; 

ft! for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( nuls, toats[ natt ] ); 

Ul UTL_SET_DELETE( nuls, al ); 

25Lp UTL_SET_DIFF_INPLACE( nuls, end_atoms, nuls ); 

„fc 

W UTL_SET_OR_INPLACE( nxcn, nuls, nxcn ); 

b UTL_SET_DIFF_INPLACE( nxcn, p[pnow].path, nxcn ); 

G } 

30|= UTLJET_COPYJNPLACE( p[pnow].nxtls, nxcn ); 

for (pnow = 0; pnow < np; pnow++) { 
H= /* remove duplicate atoms caused by new ring closure */ 
35 for (pdone = 0; pdone < np; pdone+ + ) if (pdone != pnow) { 

UTL_SET_AND_INPLACE( p[pnow].path, p[pdone].nxtls, a2chk ); 
if ((new_rings = UTL_SET_CARDINALITY( a2chk ))) { 
/* we have ring closure(s) */ 

ringclosed = TRUE; 
40 UTL_SET_OR_INPLACE( end atoms, a2chk, end atoms ); 

UTL_SET_DIFF_INPLACE( p[pdone].nxtls, a2chk, p[pdone].nxtls ); 

} 

} 

/* stop growing a path that has reached anything in attach3set */ 
45 if (toroot) { 

elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
if (UTL_SET_MEMBER( p[pnow].path, elem ) ) { 
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p[pnow]:done = TRUE; 
break; 

} 

} 

} 

} 

/* add all OK new atoms to all paths */ 

for (pnow = 0; pnow < np; pnow++) { 
UTL_SET_OR_INPLACE(p[pnow].path, p[pnow].nxtls, p[pnow].path ); 
UTL_SET_CLEAR( p[pnow].nxtls ); 

} 

/* done growing paths if no more atoms added to any path .. */ 

for (pdone = 0, nuats = 0; pdone < np; pdone + + ) 

nuats += UTL_SET_CARDINALITY( p[pdone].path ); 

if (nuats < =nats && !ringclosed) growing = FALSE; 

nats = nuats; 
/* or after 100 atom layers out regardless */ 

ncycles++; 

if (ncycles > = 100) growing = FALSE; 

} 

/* debugging */ 
if (debug) for (pdone = 0; pdone < np; pdone+ +) { 
sprintf( tempString, "Path %d (from %d): 

pdone+1, p[pdone].root ); 
fprintf( stdout, tempString ); 
ashow( p[pdone].path ); 

} 

if (full_data) { 

if (!( bptr-> detail- >to_atts = UTL_SET_CREATE( natot + 1 ) )) goto 
UTL_SET_INSERT( bptr- > detail- > toatts, al ); 

} 

/* compute the path properties */ 
for (pdone = 0; pdone < np; pdoneH- +) { 

p [pdone]. chosen = toroot; 
if (toroot) { 

p[pdone]. chosen = FALSE; 

elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
if (UTL_SET_MEMBER( p[pdone].path, elem ) ) { 
/* recording atom ID for later use */ 

p[pdone]. chosen = TRUE; 
p[pdone].a3id = elem; 
arec2 = p [pdone]. root; 
break; 

} 
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} 

} 

p[pdone].nats = UTL_SET_CARDINALITY( p[pdone].path ); 
p[pdone].nrings = p[pdone].nrings ? 1 : 0; 
5 p[pdone].mw = 0.0; 

p[pdone].done = 0; 

if (full_data) UTL_SET_OR_INPLACE( bptr-> detail- >to_atts, p[pdone].path, 
bptr- > detail- > to_atts ); 

} 

10 

/* return all root atoms, ordered best to worst */ 
for (p2do = 0; p2do < np; p2do+ + ) { 
/* start with first unchosen atom */ 
for (pdone = 0; pdone < np; pdone++) if (!p[pdone].done) { 
15 best = pdone; 

break; 

} 

/* look for something better */ 
for (pdone = 0; pdone < np; pdone+ +) if (!p[pdone].done && pdone ! = best) { 
20^ decision = FALSE; 

jn if (p[best]. chosen != p [pdone], chosen) { 

S decision = TRUE; 

m if (!p[best]. chosen && p [pdone]. chosen) best = pdone; 

in > 

25 = g if (Idecision) { 

jr if (p[pdone].nats != p[best].nats ) { 

rg decision = TRUE; 

if (p[pdone].nats > p[best].nats) best = pdone; 

3° } } 
□ if (Idecision) { 

f1j p[pdone].mw = get_path_mw( p [pdone]. path, ct, p[pdone].mw ); 

O p[best].mw = get_pathjn\v( p[best].path, ct, p[best].mw ); 

M> if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw 1 1 

35 p[pdone],mw - p[best].mw < -0.01 * p[best].mw ) { 

decision = TRUE; 

if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw) best = pdone; 

} 

} 

40 /* checking relative geometries of attachments via "improper" torsion */ 

/* the phenyl ether problem - if candidates are 180 degrees apart and we are on the 
root side of the torsion, pick the atom to the "right", not the "left", of the main chain */ 

45 if ([decision && toroot && p[pdone].a3id ) { 

/* are we 180 apart? */ 

a4 = p[pdone].a3id; 

potl = UTL_GEOM_TAU( coo+(a4-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 
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coo+(p[best].root-l) 3 ); 

pot2 = UTL_GEOM_TAU( coo+(a4-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 
coo +(p[pdone]. root- 1)*3 ); 

podiff = potl - pot2; 
5 while (podiff < 0.0) podiff + = 360.0; 

while (pot2 < 0.0) pot2 + = 360.0; 
if (podiff < 190.0 && podiff > 170.0 ) { 
decision = TRUE; 
if (pot2 < 180.0) best = pdone; 

10 } 
} 

if ({decision) { 

/* if not already set, according to the previous special case, then */ 
/* if torsions differ by 360 degrees then we have trans, prefer the +180 */ 
15 tl = UTLGEOMTAU ( coo +(p[pdone]. root- 1)*3, coo+(al-l)*3, coo+(arec2-l)*3, 

coo+(p[best].root-l)*3 ); 

t2 = UTL GEOM TAU ( coo+(p[best].root-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 
coo+(p[pdone].root-l)*3 ); 

diff = tl - 12; 

2CH if (diff > 355.0) best = pdone; 

5 else if (diff > -355.0) { 

m while (tl < 0.0) tl + = 360.0; 

m if (tl > 170.0 && tl < = 350.0) best = pdone; 

25| } } 

4* } 

Ly /* output all information about this atom */ 
if (p2do < 3) { 

O if (full_data) { 

3ap if(p2do){ 

Q bptr-> detail- > identical! p2do - 1 ] = lastdecision ? 1 : 0 ; 

fy bptr-> detail- > natlvs2[ p2do - 1 ] = lastnats - p[best].nats; 

□ bptr-> detail- > lastnatf p2do - 1 ] = p[best].nats; 

H } 
35 bptr- > detail- > best[ p2do ] = p[best] .root; 

} else only_atoms[ p2do ] = p[best].root; 

} 

lastnats = p[best].nats; 
40 lastdecision = decision; 

p[best].done = TRUE; 

} 

retval = TRUE; 
error: 

45 retval = TRUE; 

for (pnow = 0; pnow < np; pnow+ + ) { 

if (p[pnow].path) UTL_SET_DESTROY(p[pnow].path); 
if (p[pnowj.nxtls) UTL_SET_DESTROY(p[pnow].nxtls); 
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} 

return( retval ); 
toomanyattms: 

fprintf( stderr, "Too many attachments to an atom (>20)\n" ); 
goto error; 

} 

#ifO 

/ 


/* adds information for prioritizing attachments to an atom */ 
static int topMarkBest( 
Frag *fragP, 

struct CtConnectionTable *ct, 

int *atoms, /* sizeof ct->atomCount, true false for each atom to use */ 

int al, /* the root atom */ 

int a2, /* the base of the root - skip it */ 

int full_data, /* provide information relating to near symmetries? + attached 


sets */ 
{ 

#ifO 


struct bondjoprec *bptr, /* output here if full_data=TRUE */ 

int *only_atoms, /* output here if full_data= FALSE */ 

double *coo_in, /* atomic coords (retrieved from ct if not provided */ 

set_j)tr attach3set, /* if provided, a super root atom(s) 

for entire group (highest priority path is shortest to here) */ 
set_ptr a2chk, setjtr mils, set_ptr cnats, setjtr nxcn, set_ptr end_atoms ) 

#endif 

#define MAX_NP 8 
struct pathrec { 
int root, nrings, chosen, nats, done, a3id; 
double mw; 
setjtr path, nxtls; 

}; 

struct pathrec p[MAX_NP]; 

int retval, toroot, ntoats, toats[20], natt, a, np, growing, nats, natot, ncycles, pnow, ringclosed, 
debug = FALSE; 

int nuats, elem, new rings, pdone, p2do, best, decision, naout, lastnats = 0, lastdecision, arec2, 

a4; 

double *coo, tl, t2, diff, potl, pot2, podiff, get_path_mw(); 
set_ptr a2chk; 


np = 0; 
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if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &coo, &natot )) goto error; 
natot = ct->atomCount; 


#ifO 

5 toroot = attach3set 1 1 !a2; 

UTL_SET_CLEAR( end atoms ); 

if (a2) UTL_SET_INSERT( end_atoms, a2 ); 

arec2 = a2; 

#endif 

10 

a2chk = UTL_SET_CREATE(natot + 1); 

UTL_SET_CLEAR( a2chk ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondCount, &ntoats) )) goto error; 
15 if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondTo Atoms, &toats ) )) goto error; 
for (natt=0; natt< ntoats; natt++) UTL_SET_INSERT( a2chk, toats[ natt ] ); 

#if0 

if (a2) UTL_SET_DELETE( a2chk, a2 ); 

20Q #endif 

© /* initialize path records */ 

fU a = -1; 

Uj np = 0; 

25=F while (np < MAX_NP && (a = UTL_SET_NEXT( a2chk, a)) > = 0 ) { 

=p if (!(p[np].path = UTL_SET_CREATE( natot + 1 ) )) goto error; 

K if (!(p[np].nxtls = UTL_SET_CREATE( natot + 1 ) )) goto error; 

= _ p[np].root = a; 

CJ p[np].nrings = p[np].done = p[np].a3id = 0; 

30f UTL_SET_INSERT( p[np] .path, a ); 

9 np++; 

1 } 

^ /* grow the paths */ 
35 growing = TRUE; 

nats = 0; 
ncycles = 0; 
while (growing ) { 
nuats = 0; 
40 ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow+H- ) if (!p[pnow].done) { 
UTL_SET_COPY_INPLACE( cnats, p[pnow].path ); 
UTL_SET_CLEAR( nxcn ); 
elem = -1; 

45 /* accumnulate this generation of attached atoms into nxcn */ 

while ( (elem = UTLJSET_NEXT( cnats, elem)) > = 0 ) { 
UTL_SET_CLEAR( nuls ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats) )) goto error; 
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if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondTo Atoms, &toats ) )) goto 

error; 

for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( nuls, toats[ natt ] ); 
5 UTL_SET_DELETE( nuls, al ); 

UTL_SET_DIFF_INPLACE( nuls, end_atoms, nuls ); 

UTL_SET_OR_INPLACE( nxcn, nuls, nxcn ); 
UTL_SET_DIFF_INPLACE( nxcn, p[pnow].path, nxcn ); 

10 } 

UTL_SET_COPY_INPLACE( p[pnow].nxtls, nxcn ); 

} 

/* mark if reached root */ 

for (pnow = 0; pnow < np; pnow+ +) { 
15 /* remove duplicate atoms caused by new ring closure */ 

for (pdone = 0; pdone < np; pdone++ ) if (pdone != pnow) { 
UTL_SET_AND_INPLACE( p[pnow].path, p[pdone].nxtls, a2chk ); 
if ((new_rings = UTL SET_CARDINALITY( a2chk ))) { 
/* we have ring closure(s) */ 


20p ringclosed = TRUE; 

J UTL_SET_OR_INPLACE( end_atoms, a2chk, endatoms ); 

m UTL_SET_DIFF_INPLACE( p[pdone].nxtls, a2chk, p[pdone].nxtls ); 

fij } 
ill } 

25£ /* stop growing a path that has reached anything in attach3set */ 


41 if (toroot) { 

ffl elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
O if (UTL_SET_MEMBER( p[pnow].path, elem ) ) { 

30p p[pnow].done = TRUE; 

0 break; 

fy } 
o } 
n } 

35 } 

/* add all OK new atoms to all paths */ 

for (pnow = 0; pnow < np; pnow++) { 
UTL_SET_OR_INPLACE( p[pnow].path, p[pnow].nxtls, p[pnow].path ); 
UTL_SET_CLEAR( p[pnow].nxtls ); 

40 } 

/* done growing paths if no more atoms added to any path .. */ 
for (pdone = 0, nuats = 0; pdone < np; pdone + + ) 

nuats += UTL_SET_CARDINALITY( p[pdone].path ); 
if (nuats < =nats && Iringclosed) growing = FALSE; 
45 nats = nuats; 

/* .. or after 100 atom layers out regardless */ 
ncycles+ + ; 

if (ncycles > = 100) growing = FALSE; 
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} 

/* debugging */ 
if (debug) for (pdone = 0; pdone < np; pdone++ ) { 
sprintf( tempString, "Path %d (from %d): \ 
5 pdone-f 1, p[pdone].root ); 

fprintf( stdout, tempString ); 
ashow( pfpdone]. path ); 

} 

10 if (full_data) { 

if (!( bptr->detail->to_atts = UTL_SET_CREATE( natot + 1 ) )) goto error; 
UTL_SET_INSERT( bptr- > detail- > to atts, al ); 

} 

15 /* compute the path properties */- 

for (pdone = 0; pdone < np; pdone+ +) { 

pfpdone]. chosen = toroot; 
if (toroot) { 

20^j pfpdone]. chosen = FALSE; 

if! elem = -1; 

5 while ((elem = UTL_SET NEXT( attach3set, elem)) >= 0) { 

ry if (UTL_SET_MEMBER( pfpdone], path, elem ) ) { 

yi /* recording atom ID for later use */ 
25j pfpdone]. chosen = TRUE; 

pfpdone]. a3id = elem; 
arec2 = pfpdone]. root; 
break; 

} 

} 

o } ■ 

ry pfpdone]. nats = UTL_SET_CARDINALITY( p[pdone].path ); 

O p[pdone].nrings = p[pdone].nrings ? 1 : 0; 

N= p[pdone].mw = 0.0; 

35 p[pdone].done = 0; 

if (fulldata) UTL_SET_OR_INPLACE( bptr- > detail- >to_atts, p [pdone]. path, 
bptr- > detail- > to_atts ); 
} 

40 /* return all root atoms, ordered best to worst */ 
for (p2do = 0; p2do < np; p2do+ + ) { 
/* start with first unchosen atom *7 
for (pdone = 0; pdone < np; pdone++) if (!p[pdone].done) { 
best = pdone; 
45 break; 

} 

/* look for something better */ 
for (pdone = 0; pdone < np; pdone+ +) if (!p[pdone].done && pdone != best) { 
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decision = FALSE; 

if (p[best]. chosen != p [pdone]. chosen) { 
decision = TRUE; 

if (!p[best]. chosen && ptpdone]. chosen) best = pdone; 

5 } 

if (Idecision) { 
if (p[pdone].nats != p[best].nats ) { 
decision = TRUE; 

if (p[pdone].nats > p[best].nats) best = pdone; 

10 } 

} 

if (Idecision) { 

p[pdone].mw = get_path_mw( p[pdone].path, ct, p[pdone].mw ); 
p[best].mw = get_path_mw( p[best].path, ct, p[best].mw ); 
15 if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw 1 1 

p[pdone].mw - p[best].mw < -0.01 * p[best].mw ) { 
decision = TRUE; 

if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw) best = pdone; 

} ! 

20h } 

^ /* checking relative geometries of attachments via "improper" torsion */ 

ry /* the phenyl ether problem - if candidates are 180 degrees apart and we are on the 
m root side of the torsion, pick the atom to the "right", not the "left", of the main chain */ 
25lf 

j; if (idecision && toroot && p[pdone].a3id ) { 

•33 I* ^ we 180 apart? */ 

a4 = p[pdone].a3id; 

O potl = UTL_GEOM_TAU( coo+(a4-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 

3iE coo+ft)[best].root-l)*3 ); 
p pot2 = UTL_GEOM_TAU( coo+(a4-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 

fy coo +(p[pdone]. root- 1)*3 ); 
□ podiff = potl - pot2- 

C while (podiff < 6.0) podiff + = 360.0; 

35 while (pot2 < 0.0) pot2 + = 360.0; 

if (podiff < 190.0 && podiff > 170.0 ) { 
decision = TRUE; 
if (pot2 < 180.0) best = pdone; 

} 

40 } 

if (Idecision) { 

/* if not already set, according to the previous special case, then */ 

/* if torsions differ by 360 degrees then we have trans, prefer the +180 */ 

tl = UTL GEOM TAU ( coo+(p[pdone].root-l)*3, coo + (al-l)*3, coo + (arec2-l)*3, 
45 coo+(p[best].root-l)*3 ); 

t2 = UTL GEOM TAU ( coo+(p[best].root-l)*3, coo+(al-l)*3, coo + (arec2-l)*3, 
coo +(p[pdone]. root- 1)*3 ); 

diff = tl - 12; 
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if (diff > 355.0) best = pdone; 
else if (diff > -355.0) { 

while (tl < 0.0) tl += 360.0; 

if (tl > 170.0 && tl < = 350.0) best = pdone; 

5 } 
} 

} 

/* output all information about this atom */ 
if (p2do < 3) { 
10 if (full_data) { 

if(p2do){ 

bptr-> detail- > identical! p2do - 1 ] = lastdecision ? 1 : 0 ; 
bptr-> detail- > natlvs2[ p2do - 1 ] = lastnats - p[best].nats; 
bptr-> detail- > lastnat[ p2do - 1 ] = p[best].nats; 

15 } 

bptr-> detail- >best[ p2do ] = p[best].root; 
} else only_atoms[ p2do ] = p[best].root; 

} 

20q lastnats = p[best].nats; 
Jp lastdecision = decision; 
ffl p[best].done = TRUE; 

ft } 

jj] retval = TRUE; 
25£ error: 

ji retval = TRUE; 

ijj for (pnpw = 0; pnow < np; pnow+ + ) { 

T if (p[pnow].path) UTL_SET_DESTROY(p[pnow].path); 

O if (p[pnow].nxtls) UTL_SET_DESTROY(p[pnow].nxtls); 

3Qp } 

O return( retval ); 

flj toomanyattms: 

Q fprintf( stderr, "Too many attachments to an atom (>20)\n" ); 
H goto error; 
35 } 

#endif 

static double get_path_mw( set_ptr aset, struct CtConnectionTable *ct, double mw ) 
/* returns the total atomic weight of all atoms in aset */ 
40 { 

int elem = -1; 
double aw, ans = 0.0; 

if (mw) return( mw ); 
45 elem = -1; 

while ( (elem = UTL_SET_NEXT( aset, elem)) > = 0 ) { 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomAtomicWeight, &aw ) )) return( 0.0 

); 

105 


ans += aw; 

} 

return( ans ); 


static void ashow( set_ptr aset ) 

/* for interactive debugging, shows a set's membership in terms of atom ID */ 

{ 

char buff[1000], *b; 
10 int el em; 

*buff = '\0'; 
b = buff; 
elem = -1; 

15 while ( (elem = UTL_SET_NEXT( aset, elem)) > = 0 ) { 

sprintf( b, " %d", elem ); 
b = buff + strlen(buff ); 

} 

sprintf( b, "\n" ); 
20=*= fprintf( stdout, buff ); 

| } ] 

m /* CoMFA region descriptor - here it's a hidden data type */ 
25J double *TOP_STER_EVAL_RB_ATTEN( 


= = = = = = = = = = = = = = = == = = == = = = :=== = = = */ 

/* computes and returns a CoMFA steric field, to be freed by caller when done */ 

struct CtConnectionTable *ct, 
l_RegionPtr regp, 

int root, /* atom ID of fragment root */ 

double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

*/ 

set_ptr a2use, /* optionally, if not NIL, field results only from this set of atoms */ 
double *ext_vdw_wt ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 

40 { 

int natot, nat, ix, iy, iz; 

double *steric=NIL, *AtWts=NIL, *TOP_FIELD_RB_WTS0, *ftemp, *coord, *vAwt=NIL, 
*vBwt=NIL, *va, *vb, *st; 

double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atm_steric, sum steric, 
45 TOP_GET_ATOM_VDW_RADIUS0; 

#define MIN SQ DISTANCE 1.0e-4 
#define RADIUS_C3 1.7 
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#define EPSIL0N_C3 .107 
#define STERIC_MAX 30.0 

/* get coordinates, # atoms, RB attenuation for each atom */ 
5 if ((ftemp = acoord )) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &ftemp, &natot )) goto error; 
if (!(AtWts = TOP_FIELD_RB_WTS( ct, root, a2use ) )) goto cleanup; 

10 /* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 

if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (regp->box_array[0]. atom type != 1 | j regp- > nboxes != 1) 

fprintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 
15 the steric field calculation. \n" ); 

for (nat= 1; nat < = natot; nat+ +) if (!a2use | | UTL_SET_MEMBER( a2use, nat )) { 
radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 
radfiow + = RADIUS_C3; 
epsnow = sqrt( epsnow * EPSILON C3 ); 
20h vAwt[ nat-1 ] = epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 

."S vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 

S if (ext vdw wt) { 

Si vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

m vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

25j } 

r- } 

I" " /* empty output array */ 

□ if (! (steric = (double *) UTL_MEM_CALLOC( regp->n_points, sizeof( double )) )) goto 
30£ cleanup; 

□ st = steric; 

Q /* cycling over output array elements */ 

U for (iz=0, z=regp->box_array[0].lo[2]; iz < regp->box_array[0].nstep[2]; iz-h + , z + = 

35 regp->box_array[0].stepsize[2]) 

for (iy=0, y=regp->box_array[0].lo[l]; iy < regp->box_array[0].nstep[l]; iy+ + , y + = 
regp->box_array[0].stepsize[l]) : 

for (ix=0, x=regp->box_array[0].lo[0]; ix < regp->box_array[0].nstep[0]; ix+ +, x + = 
regp- > boxarray [0] .stepsize[0]) 
40 { 

/* cycling over ligand atoms */ 

for ( nat = 0, coord = ftemp, sum_steric = 0, va = vAwt, vb = vBwt; nat < natot; nat+ + , 
va+ + ,vb++) 

if (!a2use 1 1 UTL_SET_MEMBER( a2use, nat )) { 
45 dis2 = x - *coord++ ; dis2 *= dis2; 

diff = y - *coord+ + ; diff *= diff; dis2 + = diff; 
diff = z - *coord+ + ; diff *= diff; dis2 + = diff; 

if ( dis2 < MIN SQ DISTANCE ) atm_steric = STERIC_MAX * AtWts[ nat ]; 
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else { 

dis6 = dis2 * dis2 * dis2; 
disl2= dis6 * dis6 ; 
atmsteric = (*vb)/disl2 - (*va)/dis6; 
5 atm_steric = atm_steric > ( STERICJtfAX * AtWts[ nat ] ) ? STERIC_MAX * 

AtWts[ nat ] : atm steric; 

} 

- sumsteric += atm steric; 

} 

10 else coord + = 3; 

*st = sum_steric > STERIC_MAX ? STERIC_MAX : sum_steric; 
st++; 

} 

IS cleanup: 

if (AtWts) UTL_MEM_FREE( AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

20?=% return( steric ); 

S } 


25 c { 
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static l_RegionPtr getRegionToUse(double *coords, int natoms, int *r_idx, int *r_npoints ) 


l_ComfaRegion *r; 

static double minx, maxx, miny, maxy, minz, maxz; 
s int i; 

□ double x,y,z; 

3C^ double cminx, cminy, cminz, cmaxx, cmaxy, cmaxz; 

□ double edgeFact = 0.05; 


cminx = cminy = cminz = 99999.0; 
cmaxx = cmaxy = cmaxz = -99999.0; 


for ( i = 0; i < natoms; i+ + ) 
{ 

x = *coords; 
y = *(coords+l); 
40 z = *(coords+2); 

if ( x < cminx ) 

cminx = x; 
if ( x > cmaxx ) 
45 cmaxx = x; 

if ( y < cminy ) 

cminy = y; 
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10 } 


15 


if ( y > cmaxy ) 

cmaxy = y; 

if ( z < cminz ) 

cminz = z; 

if ( z > cmaxz ) 

cmaxz = z; 

coords +=3; 


for ( i = minRegion; i < maxregions; i+ + ) 

{ 

r = regions[i]; 

minx = r->box_array[0].lo[0] + edgeFact; 
miny = r->box_array[0].lo[l] + edgeFact; 
minz = r->box_array[0].lo[2] + edgeFact; 

20q maxx = minx + ( (double) r->box_array[0].nstep[0] -1.0 ) * 

5 r->box_array[0].stepsize[0] - (edgeFact*2.0); 

m maxy = miny + ( (double) r->box_array[0].nstep[l] -1.0 ) * 

fy r- > box_array[0].stepsize[l] - (edgeFact *2.0) 
m maxz = minz + ( 

25ir r->boxjirray[0].stepsize[2] - (edgeFact*2.0) 


(double) r->box_array[0].nstep[2] -1.0 ) * 


m #if0 

T if ( r->box_array[0].lo[0] = = 0.0 ) 

□ minx = -0.1; 

3Qjz #endif 


i'U if ( cminx > == minx && cmaxx < = maxx && cminy > = miny && cmaxy < = maxy 

Q && cminz > = minz && cmaxz < = maxz ) 

^ { i 

35 *iMdx = !i; 

*r_npoints = r->n_points; 
regionUseCnts[i] += 1; 
return r; 

} 

40 } 

i = max regions - 1; 
*r_idx = i; 

regionUseCnts[i] += 1; 
45 r = regions[i]; 

*r_npoints = r->n_points; 

return r; 
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} 

static int getCordExtents(double *coords, int natoms, double *r_minx, double *r_miny, double *r_minz, 
double *r_maxx, double *r_maxy, double *r_maxz ) 
5 { 

double minx, maxx, miny, maxy, minz, maxz; 
double x,y,z; 
int i; 

10 minx = maxx = *coords; 

miny = maxy = *(coords+l); 
minz = maxz = *(coords+2); 
coords +=3; 

15 for ( i = 1; i < natoms; i + + ) 

{ 

x = *coords; 
y = *(coords+l); 
z = *(coords+2); 
20N coords + = 3; 


in 


if ( x < minx ) 

minx = x; 
else if ( x > maxx ) 
25J- maxx = x; 


m if ( y < miny ) 
s miny = y; 

p else if ( y > maxy ) 
3Qp maxy = y; 

fy if ( z < minz ) 
p minz = z; 

M else if ( z > maxz ) 
35 maxz — z; 


40 


45 


*r_minx 


minx; 

*r_maxx 


maxx; 

*r_miny 


miny; 

*r_maxy 


maxy; 

*r_minz 


minz; 

*r_maxz 


maxz; 

return 0; 
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static int atomsOutside(double *coords, int natoms, IRegionPtr regp, double *atwts, double *r_outpen 

) 

{ 

static I RegionPtr lastreg; 
5 static double minx, maxx, miny, maxy, minz, maxz; 

int i; 

int outside; 
double x,y,z; 
double dist; 
10 double edgeFact = 0.0; 

double incrfact; 
double outsidePen = 0.0; 

if ( regp != lastreg ) 
15 { 

minx = regp->box_array[0].lo[0] + edgeFact; 
miny = regp->box_array[0].lo[l] + edgeFact; 
minz = regp->box_array[0].lo[2] + edgeFact; 

20U maxx = minx + (double) ( regp->box_array[0].nstep[0] -1 ) * 

% regp->box_array[0].stepsize[0] - (edgeFact*2.0); 

g maxy = miny 4- (double) ( regp->box_array[0].nstep[l] -1 ) * 

SI regp->box_array[0].stepsize[l] - (edgeFact *2.0); 

\n maxz = minz + (double) ( regp->box_array[0].nstep[2] -1 ) * 

25[g regp->box_array[0].stepsize[2] - (edgeFact *2.0); 

ssss: 

g /* When calculating atoms outside the region, count the atoms close to the edge 

n as well. 

3qp */ 


45 


#if 0 


lastreg = regp; 


U fprintf(stderr,"%6.21f %6.21f %6.21f %6.21f %6.21f %6.21f \n M , 

35 minx, maxx, miny, maxy, minz, maxz ); 

#endif 

} 

outsidePen = 0.0; 
40 for ( i = outside = 0; i < natoms; i+ + ) 

{ 

x = *coords; 
y = *(coords+l); 
z = *(coords+2); 


if ( x < minx j j x > maxx 1 1 y < miny j | y > maxy | j z < minz j j z > maxz ) 

{ 

outside + +; 
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/* calculate a crude distance anyway */ 

dist = 0.0; 
if ( x < minx ) 

dist + = x*x - minx*minx; 
5 else if ( x > maxx ) 

dist + = x*x - maxx*maxx; 

if ( y < miny ) 

dist + = y*y - miny*miny; 
10 else if ( y > maxy ) 

dist + = y*y - maxy*maxy; 

if ( z < minz ) 

dist + = z*z - minz*minz; 
15 else if ( z > maxz ) 

dist + = z*z - maxz*maxz; 

dist = fabs(dist); /* just in case */ 

20- if (dist >= 1.0) 

J incrfact = STERIC^MAX * atwts[i]; 

m e i se 

fy incrfact = STERIC_MAX * atwts[i] * dist; 

m outsidePen + = incrfact*incrfact; 

25£ #ifO 

J fprintf(stderr, "outside %d atom:%d %6.21f %6.21f %6.21f points: %d %6.21f 

ijj %6.21f %6.21f %6.21f %6.21f %6.21f \n M , 

a outside, i, x, y, z, regp->n_points, minx, miny, minz, maxx, maxy, 

□ maxz); 

30E #endif 

5 } 

rU . coords +=3; 

6 } 

H *r_outpen = outsidePen; 

35 #if 0 

fprintf(stderr, M i_extent: x %d %d y %d %d z %d %d\n", 

(int) cminx, (int) cmaxx, (int) cminy, (int) cmaxy, (int) cminz, (int) cmaxz ); 

Q>rintf(stderr,"extent: x %6.11f %6.11f %6.11f %6.11f %6.11f %6.11f \n", 
cminx, cmaxx, cminy, cmaxy, cminz, cmaxz ); 

40 #endif 

if ( outside ) 

t_outside+ + ; /* t_outside count's how many compounds have at least one atom outside 

the field */ 

t_fields+ + ; , 
45 return outside; 

} ■ 

double TOP_STER_EVAL_ALL_RB_ATTEN( 
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* 


= = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* computes and returns a CoMFA steric field, to be freed by caller when done */ 

5 

struct CtConnectionTable *ct, 
IRegionPtr regp, 

int root, /* atom ID of fragment root */ 

double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

10 */ 

double *AtWts ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 

{ 

#ifhdef NO_COMPRESSION 
15 static int max alloc; 

static double *st_steric; 

#endif 

int natot, nat, ix, iy, iz; 

double *steric=NIL, *TOP_FIELD_RB WTSO, *ftemp, *coord, *vAwt=NIL, *vBwt=NIL, 
20- *va, *vb, *st; = 

% double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atmsteric, sumsteric, 

~ TOP_GET_ATOM_VDW_RADIUS0; 

ry double xd, yd, zd; 

in double maxw; 

25^ double stepz, stepy, stepx; 

jb int nstepz, nstepy, nstepx; 

m double lowz, lowy, lowx; 

7 #if o 

rj int startEmpty, endEmpty; 

30| #endif 
□ int npoints; 

jy int freeWeights = 0; 

O int outsideCnt = 0; 

H #if 0 

35 static double mindis = 99999.0; 

static double maxdis = -99999.0; 
static double maxdists[50]; 
static int distldx = -1; \ 
double abssteric; f 

40 #endif 

#define MIN_SQ_DISTANCE 1.0e-4 
#defme RADIUS_C3 1.7 
#define EPSILON_C3 .107 
45 #define STERIC_MAX 30.0 

#if 0 

if (distldx == -1 ) 
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{ 

for ( nat = 0; nat < 50; nat+ + ) 

maxdists[nat] = STERIC_MAX * -1.0; 
distldx = 0; 

} 

#endif 


/* get coordinates, # atoms, RB attenuation for each atom */ 
10 if ((ftemp = acoord )) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCODCoordSet, &ftemp, &natot )) goto error; 

#if0 

15 AtWts = computeVdwWeights(ct, root - 1, -1, q_ReductionFactor, (int **) 0 ); 

lendif 

if ( !AtWts ) 
{ 

AtWts = (double *) malloc( natot * sizeof(double) ); 
20 n for ( nat = 0; nat < natot; nat + + ) 

J AtWts[nat] = 1.0; 

m fteeWeights = 1; 

ffi ) 

25j if (!(AtWts = TOP_FIELD_RB WTS( ct, root, (set_ptr) 0 ) )) goto cleanup; 

_fc for ( nat = 0; q_debugfp && ext_vdw_wt && nat < ct- > atomCount; nat + + ) 

mi 

fprintf(<i_debugfp ,"# weights %d %8.31f %8.31f\n", nat+1, AtWts[nat], 
q ext_vdw_wt[nat] ); 

} 

#endif 


H /* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 
35 if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (regp->box_array[0].atom_type != 1 1 1 regp- > n boxes != 1) 

fprintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 
the steric field calculation. \n" ); 
40 for (nat= 1; nat < = natot; nat+ +) 

{ 

radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 
radnow + = RADIUS_C3; 
epsnow = sqrt( epsnow * EPSILON C3 ); 
45 vAwt[ nat-1 ] = epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 

vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 

#if0 

if (ext vdw wt) { ; 
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vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 
vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

} 

#endif 
5 } 

/* empty output array */ 

/* Don't initialize with calloc, we set each field, waist of time, it really is. 
A 38% speedup was performed by calling malloc vs calloc 

10 */ 

nstepz = regp->box_array[0].nstep[2]; 
nstepy = regp->box_array[0].nstep[l]; 
nstepx = regp->box_array[0].nstep[0]; 

stepz = regp->box_array[0].stepsize[2]; 
stepy = regp->box_array[0],stepsize[l]; 
stepx = regp->box_array[0].stepsize[0]; 

20^ npoints = nstepz * nstepy * nstepx; 


15 


25V 


45 


lowz = regp->box_array[0].lo[2]; 
lowy = regp->box_array[0].lo[l]; 
lowx = regp->box_array[0].lo[0]; 


Jz #ifhdef NO_COMPRESSION 

m if ( npoints > maxalloc ) 

r { 

O if ( Imax alloc ) 

3QjE max alloc = 4000; 

□ while ( npoints > max alloc ) 

fy max_alloc *= 2; 

; ; 

U if ( st_steric ) 

35 free((char *) st_steric ); 

st_steric = (double *) malloc(sizeof(double) * max_alloc ); 

} 

steric = st_steric;; 
#else s 
40 steric = (double *) malloc( npoints * sizeof( double ) ); 

#endif 


st = steric; 

/* cycling over output array elements */ 

for (iz=0, z=lowz; iz < nstepz; izH- +, z + = stepz ) 
for (iy=0, y=lowy; iy < nstepy; iy+ + , y += stepy ) 
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for (ix=0, x=lowx; ix < nstepx; ix+ + , x + = stepx ) 

{ 

/* cycling over ligand atoms */ 

for ( nat = 0, coord = ftemp, sumsteric = 0.0, va = vAwt, vb = vBwt; 
nat < natot && sum steric < STERIC_MAX; 
nat+ + , va+ + , vb++) 

{ 

#if 0 

dis2 = x - *coord+ + ; dis2 *= dis2; 
diff = y - *coord+ + ; diff *= diff; dis2 + = diff; 
diff = z - *coord+ + ; diff *= diff; dis2 + = diff; 

#endif 

xd = x - *coord+ + ; 
yd = y _ *coord+ + ; 
zd = z - *coord+ + ; 
dis2 = xd*xd + yd*yd + zd*zd; 

#if 0 

if (dis2 > 49.0) 
continue; 

#endif 

if ( dis2 > = MIN_SQ_DISTANCE ) 
{ 

dis6 = dis2 * dis2 * dis2; 

disl2= dis6 * dis6 ; 

atm_steric = (*vb)/disl2 - (*va)/dis6; 

#if0 

abs_steric = fabs(atm_steric); 

if ( AtWts[nat] = = 1.0 && dis2 > 0.0 ) 

{ 

if ( dis2 < mindis && abs_steric < 0.001 ) 
{ 

, fprintf(stderr,"%10.81f dis:%7.31f\n", atm_steric, dis2 

); 

mindis = dis2; 

} 

distldx = (int) dis2; 

if ( distldx < 49 && abssteric > maxdists[distldx] ) 

{ 

fprintf(stderr,"idx %d: %10.81f dis:%10.51f abs:%8.41f 

max\n", distldx, atmsteric, dis2, abs steric); 

maxdists[distldx] = abssteric; 

} 

} 

#endif 

maxw = STERIC_MAX * AtWts[ nat ]; 
if ( atm_steric > maxw ) 
atmsteric = maxw; 
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} 

else 

{ 

atm_steric = STERIC_MAX * AtWts[ nat ]; 

} 

sumsteric + = atm steric; 

} 

*st = sum steric > STERIC MAX ? STERIC_MAX : sum steric; 
st++; 

} 

#ifO 

for ( st = steric, iz = startEmpty = 0; iz < npoints && *st < 0.01 ; iz+ + , st+ + ) 
{ 

startEmpty + + ; 

} 

for ( st = steric + (npoints -1), iz = npoints, endEmpty = 0; iz && *st < 0.01; iz--, st— ) 

{ 

endEmpty++; 

} 

fprintf(stderr,"%d %dof;%d %6.21f \n", 

startEmpty, endEmpty, npoints, ((double) (startEmpty+endEmpty)*100.0)/(double) 

npoints ); 

#endif 

cleanup: 

if (AtWts && freeWeights) free ( (char*) AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

return( steric ); 

} 

double *TOP_STER_ATOM_EVAL_ALL_RB_ATTEN( 

/ * 


= = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* computes and returns a CoMFA steric field, to be freed by caller when done, 
this version only computes the fields around each atom, outer loop is the ct's atoms */ 

struct CtConnectionTable *ct, 
I RegionPtr regp, 

int root, /* atom ID of fragment root */ 

double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

*/ 

double *AtWts ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 

{ 

#ifndef NO_COMPRESSION 
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static int max_alloc; 
static double *st_steric; 
#endif j 
int natot, nat, ix, iy, iz; I 

5 double *steric=NIL, *TOPJ 7 IELD_RB_WTS0, *ftemp, *coord, *vAwt=NIL, *vBwt=NIL, 

*st; 

double va, vb; 

double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atm_steric, sumsteric, 
TOP„GET_ATOM_VDW_RADIUS0; 
10 double xd, yd, zd; 

double maxw; 

double stepz, stepy, stepx; 

int nstepz, nstepy, nstepx; 

double lowz, lowy, lowx; 
15 double currjowz, currlowy, currlowx; 

int curr_nstepsz, curr_nstepsy, curr nstepsx; 

int curr_ix, curr_iy, curr_iz; 

double currx, curr_y, currz; 

int maxsteps; /* assumes stepz, stepy, and stepx are the same step size */ 
20h int max_xSteps, max _ySteps, max_zSteps; 

2 #if0 

m int startEmpty, endEmpty; 

Jij #endif 
|i int npoints; 

25^ int freeWeights = 0; 

jz int outsideCnt = 0; 

CO #if0 

r static double mindis = 99999.0; 

□ static double maxdis = -99999.0; 

3Qp static double maxdists[50]; 

O static int distldx = -1; 

fU double abs_steric; 

O #endif 

35 #define MIN_SQ_DISTANCE 1 .Oe-4 
#defme RADIUS_C3 1.7 
#define EPSILONC3 .107 
#define STERIC_MAX 30.0 

40 #if0 

if (distldx = = -1 ) 
{ 

for ( nat = 0; nat < 50; nat+ + ) 

maxdists[nat] = STERIC_MAX * -1.0; 
45 distldx = 0; 

} 

#endif 
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/* get coordinates, # atoms, RB attenuation for each atom */ 
if ((ftemp = acoord )) { ! 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &ftemp, &natot )) goto error; 

5 

#ifO 

AtWts = computeVdwWeights(ct, root -1,-1, q_ReductionFactor, (int **) 0 ); 

#endif 

if(!AtWts) 
10 { 

AtWts = (double. *) malloc( natot * sizeof(double) ); 
for ( nat = 0; nat < natot; nat+ + ) 

AtWts[nafJ = 1.0; 
freeWeights = 1; ? 

15 } 
#if0 

if (!(AtWts = TOP_FIELD_RB_WTS( ct, root, (set_ptr) 0 ) )) goto cleanup; 
for ( nat = 0; q_debugfp && extvdwwt && nat < ct- > atomCount; nat+ + ) 

{ 

20h fprintf(q_debugfp ,"# weights %d %8.31f %8.31f\n", nat+1, AtWts[nat], 

J ext_vdw_wt[nat] ); 

r 

#endif 


St 5 


25fp 

"F /* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 
rg if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

~7 if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

q if (regp->box_array[0].atom_type != 1 1 1 regp->n_boxes != 1) 

3(p Q>rintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 

□ the steric field calculation. \n" ); 
fy for (nat = 1; nat < = natot; nat+ +) 

O { 

M= radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 

35 radnow + = RADIUS_C3; 

epsnow = sqrt( epsnow * EPSILON C3 ); 

vAwt[ nat-1 ] = epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 
vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 

#if0 

40 if (ext vdw wt) { 


#endif 
45 } 


vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 
vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

} 


/* empty output array */ : 

/ Don't initialize with calloc, we set each field, waist of time, it really is. 
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A 38% speedup was performed by calling malloc vs calloc 

*/ 

nstepz = regp->box_array[0].nstep[2]; 
5 nstepy = regp->box_array[0].nstep[l]; 

nstepx = regp->box_array[0].nstep[0]; 

stepz = regp->box_array[0].stepsize[2]; 
stepy = regp->box_array[0].stepsize[l]; 
10 stepx = regp->box_array[0].stepsize[0]; 

npoints = nstepz * nstepy * nstepx; 

lowz = regp->box_array[0].lo[2]; 
15 lowy = regp->box_array[0].lo[l]; 

lowx = regp->box_array[0].lo[0]; 

max_steps = (int) (4.0 / stepx); 

if ( maxsteps < = 0 1 1 ((double) max_steps * stepx ) < 4.0 ) 
2Ck maxsteps += 1; 

m max xSteps = max_y Steps = max zSteps = max steps * 2; 

an 

3 y i 
ifz if ( max xSteps > nstepx ) 

25jf maxxSteps = nstepx; 

if ( max_ySteps > nstepy ) 

max_ySteps = nstepy; 

if ( max zSteps > nstepz ) 

maxzSteps = nstepz; 


-t~ 


3Qp 


35 


#if 0 

); 

#endif 


fprintf(stderr, M max steps: %d %d %d %d\n", max_steps, maxjcSteps, max_ySteps, max_zSteps 


#ifhdef NO_COMPRESSION 

if ( npoints > max alloc ) 

if ( !max_alloc ) \ 
40 max_allod ; = 4000; 

while ( npoints >; max_alloc ) 
max alloc *= 2; 

if ( st_steric ) 

45 free((char *) stjteric ); 

st_steric = (double *) malloc(sizeof(double) * max_alloc ); 

} " 

steric = ststeric;; 
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5 


J3 


memset((char *) st steric, *\0\ sizeof(double) * npoints ); 

#else 

steric = (double *) calloc( npoints, sizeof( double ) ); 

#endif 


st = steric; 


10 for ( nat = 0, coord = ftemp; 

nat < natot; 


35 */ 


{ 


nat+ + ) 


va = *(vAwt .+ nat); 
15 vb = *(vBwt + nat); 

curr_x = *coord;i 
curr_y = *(coord+ 1); 
curr_z = *(coord+2); 
coord +=3; 


iz = (int) ( fabs(curr_z - lowz + 0.5) / stepz); 
iy = (int) ( fabs(curr_y - lowy 4- 0.5) / stepy); 
ix - (int) ( fabs(curr_x - lowx + 0.5) / stepx); 


25^ curriz = iz - maxsteps; 

j: curriy = iy - max steps; 

Do curr_ix = ix - max_steps; 

Q currnstepsz = iz + maxsteps + 1; 

3ftp currnstepsy = iy + maxsteps + 1; 

□ currnstepsx = ix + maxsteps + 1; 


/* check boundary conditions, where the atom is near the outside of the region 


if ( curr iz < 0 ) 

curr_iz = 0; 
if ( curr iy < 0 ) 

currjy =1 0; 
40 if ( currjx < 0 ) 

currix = 0; 

/* Compute the fringe if outside the range */ 
if ( curr iz > = nstepz ) 
45 curr iz = nstepz - 1; 

if ( curr iy > = nstepy ) 

currjy = nstepy - 1; 
if ( curr_ix > = nstepx ) 

- 121 


curr ix = nstepx - 1; 

if ( currnstepsz > nstepz ) 

currnstepsz = nstepz; 

if ( curr_nstepsy > nstepy ) 

currnstepsy = nstepy; 

if ( currnstepsx > nstepx ) 

currnstepsx = nstepx; 

currjowz = lowz 4- (double) curr iz * stepz; 
currjowy = lowy + (double) currjy * stepy; 
curr lowx = lowx + (double) curr ix * stepx; 

maxw = STERICMAX * AtWts[ nat ]; 

#if 0 

Q>rintf(stderr, H xyz%6.11f %6.11f %6.11f low: %6.11f %6.11f %6.11f steps: %d %d %d clow: %6.11f 
%6.11f %6.11f idx: %d.%d %d r-ridx: %d %d %d csteps:%d %d %d\n", 

curr x, curr_y, curr z, 

lowx, lowy, lowz, 

nstepx, nstepy, nstepz, 

currjowx, curr lowy, currjowz, 

curr_ix, curr iy, curr iz, 

ix, iy, iz, 

curr_nstepsx, curr_nstepsy, curr nstepsz ); 

#endif 


/* cycling over output array elements */ 

for ( iz=curr_iz, z= currjowz; iz < curr_nstepsz; iz+ + , z += stepz ) 

{ 

zd = z - curr_z; 
zd = zd*zd; 

for (iy=curr_iy, y= currjowy; iy < currjistepsy; iy++, y += stepy ) 

{ 

yd = y - curr_y; 
yd = yd*yd; 


#ifO 

#endif 
#if 0 


ifX (zd+yd) > 49.0) 
continue; 

st = st_steric + ( (iz * nstepy * nstepx ) + (iy * nstepx ) + curr Jx); 

fprintf(stderr,"base %d from %d %d %d (matrix: %d %d %d)\n", 
(iz * nstepy * nstepx ) + (iy * nstepx) + curr Jx, 
; curr ix, iy, iz, nstepx, nstepy, nstepz ); 
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#endif 
) 


#ifO 


#endif 


if ( !<iy%3) ) 

sleep(l); 

for (ix=curr_ix, x=curr_lowx; ix < currnstepsx; ix+ + , x + = stepx 
{ 

sumsteric = *st; 

xd = x - curr_x; 

dis2 = xd*xd + yd + zd; 

if (dis2 > 49.0) 
continue; 


if ( dis2 > = MINSQDISTANCE ) 
{ 

dis6 = dis2 * dis2 * dis2; 
disl2= dis6 * dis6 ; 
atmsteric = vb/disl2 - va/dis6; 
if ( atm_steric > maxw ) 
atm_steric = maxw; 

} 

else 
{ 


} 


atm steric = maxw; 


sumsteric += atm_steric; 
*st = sum_steric > STERIC_MAX ? STERIC_MAX : sum steric; 
st+ + ; 
} 


#if0 

for ( st = steric, iz = startEmpty = 0; iz < npoints &&. *st < 0.01 ; iz+ +, st+ + ) 
{ 

startEmpty + + ; 

} 

for ( st = steric + (npoints -1), iz = npoints, endEmpty = 0; iz && *st < 0.01; iz--, st- ) 

{ 

endEmpty ++; 

} 

fprintf(stderr,"%d %d of %d %6.21f \n", 

startEmpty, endEmpty, npoints, ((double) (startEmpty+endEmpty)*100.0)/(double) 

npoints ); 
#endif 
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cleanup: 

if (AtWts && freeWeights) free ( (char*) AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

return( steric ); 


int TOP_STER_REGION_MODE(int regionMode ) 
{ 

if ( regionMode < 0 ) 

regionMode = 0; 
else if ( regionMode > 2 ) 

regionMode = 2; 

q_regionMode = regionMode; 


static int makeTopRegions(doub!e stepSize, int numFrags) 

{ 

int i; 

IComfaRegion *r; 
l_Box *b; 
int nsteps; 

static double lastStepSize; 

static int printed; 

int intStep; 

int baseSteps = 5; 

int steps[3]; 

double fullMult; 

int maxtrixSize; 

int totalPoints; 

int bigseen = 0; 

double baseX, baseY, baseZ; 

int done; 

if ( lastStepSize = = stepSize ) 

return 0; 
lastStepSize = stepSize; 
baseX = -0.1; 
baseY = -6.0; 
baseZ = -4.0; 
totalPoints = 0; 

if ( qxmin ! = 999.0 && qmode ) 
{ 

baseX = (double) ( (int) (qxmin - 1.0) ); 
baseY = (double) ( (int) (qymin - 1.0) ); 
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baseZ = (double) ( (int) (qzmin - 1.0) ); 
baseSteps = 0; 

steps[0] = (int) ((qxmax - baseX + 1.50) / stepSize) + 1; 
stepsfl] = (int) ((qymax - baseY + 1.50) / stepSize) + 1; 
steps[2] = (int) ((qzmax - baseZ + 1.50) / stepSize) + 1; 
#ifdef TRIPOS_VERSION 

fprintf(stderr,"%6.21f %6.21f %6.21f, %6.21f %6.21f %6.21f %d %d %d\n", 

qxmin, qymin, qzmin, qxmax, qymax, qzmax, steps[0], steps[l], steps[2] 

); 

#endif 

} 

else 
{ 

steps[0] = steps[l] = steps[2] = 5; 

} 

maxtrixSize = steps[0] * steps[l] * steps[2]; 
maxregions = NOREGIONS; 

/* 

We have to limit the number of regions generated to conserve memory. 

If the initial region size to fit the query in is huge, then let's not 
create too many regions around it. 

*/ 

for ( i = bigseen = done = 0; !done && i < max regions; i+ + ) 

{ 

if ( regions[i] ) 

free((char *) regions [i] ); 
r = (l_RegionPtr) UTL_MEM_CALLOC(l,sizeof(l_ComfaRegion)); 
r->n_boxes = 1; 
regions[i] = r; 
if ( r->box_array ) 

free((char *) r->box_array ); 
b = r->box_array = (l_BoxPtr) UTL_MEM_CALLOC(l,sizeof(i_Box) ); 
b[0].atom_type = 1; 

b[0].stepsize[0] = b[0].stepsize[l] = b[0].stepsize[2] = stepSize; 

b[0].lo[0] = baseX; 
b[0].lo[l] = baseY; 
b[0].lo[2] = baseZ; 
b[0].nstep[0] = steps[0]; 
b[0].nstep[l] = stepsfl]; 
b[0].nstep[2] = steps [2]; 

#ifdef TRIPOS_VERSION 
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if ( Iprinted ) 

{ 

fprintf(stderr,"%d: steps: %d,%d,%d stepsize: %6.21f base: %6.21f %6.21f 
i, steps[0], steps[l], steps[2], stepSize, b[0].lo[0], b[0].lo[l], b[0].lo[2] 

} 


r->n_points = steps[0] * steps[l] * steps[2]; 
totalPoints += r->n_points; 

done = 0; 

if ( i > = 3 && steps[0] > 12 && steps[l] > 12 && steps[2] > 12 ) 
done = i+1; 

if ( r->n_points > 3000 1 1 totalPoints > 6000 ) 
{ 

if ( bigseen = = 0 && r-> n_points < 5000 && totalPoints < 10000 ) 
{ 

baseX -= stepSize; 
baseY -= stepSize; 
baseZ -= stepSize; 
steps[0] +=2; 
steps[l] +=2; 
steps[2] + = 2; 
bigseen = 1; 

} 

else 

{ 

done = i+1; 

} 

} 

else 

{ 

if ( i < 4 ) 
{ 

steps[0] +=1; 
stepsjl] +=1; 
steps[2] +=1; 
if ( i % 2 ) 

{ 

baseZ -= stepSize; 
baseX -= stepSize; 

} 

else 

baseY -= stepSize; 
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} 

else 

{ 

if ( steps[0] < 13 ) 
5 { 

steps[0] +=1; 
if(!((i+4)%4)) 

baseX -= stepSize; 

} 

10 if ( steps[l] < 13 ) 

{ 

steps[l] +=1; 
if ((i+2) % 3) 

baseY -= stepSize; 

15 } 

if ( steps[2] < 13 ) 

{ 

steps[2] +=1; 
if ( i % 2 ) 

2(H baseZ -= stepSize; 

5 } 
m ) 

m } 

25LE if ( done && done < NO_REGIONS ) 
=£ maxregions = done; 

03 printed = 1; 

s return 1; 


3GP 


} 


l_RegionPtr TOP_MAKE_STD_REGION0 
/ 


35 /* creates a run-time description of the standard CoMFA region used for topomers 
source of region description is $DSERV_TB/rsh.rgn */ 


{ 


I RegionPtr R; 


40 if (!(R = (l_RegionPtr) UTL_MEM_CALLOC(l,sizeof(l_ComfaRegion)))) goto error; 

R- > n_boxes = 1; 

if (!(R->box_array = (l BoxPtr) UTL_MEM_CALLOC(l,sizeof(l_Box)))) goto error; 

if ( q_regionMode = = 0 ) 
45 { 

R->n_points = 1000; 
R->box_array[0].lo[0] = -4.0; 
R->box_array[0].lo[l] = -12.0; 
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10 


15 


2CN 


25 


35 


40 


45 


R->box_array[0; 
R->box_array[0; 
R->box_array[0; 
R->box_array[0 
R->box_array[0 
R->box_array[0^ 
R->box_array[0 
R->box_array[0 
R->box_array[0 
R->box_array[0 
R->box_array[0 


Jo[2] = -8.0; 
.hi[0] = 14.0; 
.hi[l] = 6.0; 
.hi[2] = 10.0; 
.stepsize[0] = 2.0; 
.stepsize[l] = 2.0; 
stepsize[2] = 2.0; 
.nstep[0] = 10; 
.nstep[l] = 10; 
.nstep[2] = 10; 

.atom_type = 1; /* c.3 atom */ 


} 

else if ( qjregionMode = = 1 ) /* bigger */ 

{ 

R->n_points = 13*13*13; 
R->box_array[0].lo[0] = -4.0; 
R->box_array[0].lo[l] = -16.0; 
R->box_array[0].lo[2] = -10.0; 
R->box_array[0].hi[0] = 18.0; 
R->box_array[0].hi[l] = 8.0; 
R->box_array[0].hi[2] = 14.0; 
R->box_array[0].stepsize[0] = 2.0; 
R->box_array[0].stepsize[l] = 2.0; 
R->box_array[0].stepsize[2] = 2.0; 
R->box_array[0].nstep[0] = 13; 
R->box_array[0].nstep[l] = 13; 
R->box_array[0].nstep[2] = 13; 
R->box_array[0].atom_type = 1; /* c.3 atom */ 

} 

else /* Huge, just huge */ 

{ 


R->n_points = 25*25*20; 


/* 12,500 points */ 


R 
R 
R 
R 
R 
R 
R 
R 
R 
R 
R 
R 
R 


} 

return R; 


>box_array[0].lo[0] = -12.0 
>box_array[0]Jo[l] = -30.0 
>box_array[0].lo[2] = -20.0 
>box_array[0].hi[0] = 36.0; 
>box_array[0].hi[l] = 18.0; 
>box_array[0].hi[2] = 18.0; 
>box_array[0].stepsize[0] = 2.0; 
>box_array[0].stepsize[l] = 2.0; 
>box_array[0].stepsize[2] = 2.0; 
>box_array[0].nstep[0] = 25; 
>box_array[0].nstep[l] = 25; 
>box_array[0].nstep[2] = 20; 
>box_array[0].atom_type = 1; /* c.3 atom */ 


error: 
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return (l.RegionPtr) 0; 

} 

double *TOP_FIELD_RB_WTS( struct CtConnectionTable *ct, int rootid, 


= = = = = = = = = = = = = = = = = = = = = = = = */ 

set_ptr a2use /* optionally, if not NIL, need to process only this set of atoms */ 
) 

10 /* constructs and returns weighting-by-rotatable-bond array for each atom */ 
{ 

/* pseudo code for FIELD_RB_WTS0 

while saw new atoms 
15 uncover atoms that stopped last shell growth 

grow next "rotational shell" 
while adding to shell 
for each atom in shell 
get neighbors not seen 
2GU for each neighbor 

;^ if bond is rotatable (acyclic, > 1 attached atom, not = ,am,#) 

J cover all other atoms attached to atom for this shell 

m add it to shell 

Iff */ 

25^p double *ansr = NIL, *vals = NIL, factor, nowfact = 1.0; 

j* int nats, b, aggcount, atid, aggid, loop, size, inRing, natt, ntoats, toats[20]; 

fQ set_ptr aggats = NIL, allats = NIL, nuls = NIL, endatms = NIL, end cands = NIL; 

J* CtBondTypeDef bType; 

3C[jE /* be sure rings were perceived */ 

□ if (!DB_CT_UTL_FIND_RINGS( ct )) goto cleanup; 

; tLJi 

□ if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &nats )) goto cleanup; 

35 /* output data allocations */ 

if (!( vals = (double*) UTL_MEM_ALLOC( sizeof(double)*nats))) goto cleanup; 

factor = aggreg_descale; 

if (!(allats = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 
40 if (!(aggats = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if (!(nuls = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if ('(endatms = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if (!(end_cands = UTL_SET_GREATE( nats + 1 ) )) goto cleanup; 

UTL_SET_INSERT( aggats, rootid ); 
45 UTL_SET_INSERT( allats, rootid ); 

aggcount = loop = 1; 

while (TRUE) { 
while (TRUE) { 
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aggid = -1; ; 

while ((aggid = UTL_SET_NEXT( allats, aggid )) > = 0 ) { 
/* put (acceptable) atoms attached to aggid into nuls */ 
UTL_SET_CLEAR( nuls ); 
5 if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, aggid, CtAtomBondCount, Antoats ) )) goto 

error; 

if (ntoats > 20) goto toomanyattms; 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, aggid, CtAtomBondTo Atoms, &toats ) )) goto 

error; 

10 for (natt=0;natt< ntoats; natt+ +)if(!a2use 1 1 UTL_SET_MEMBER(a2use, toats[natt])) 

UTL_SET_INSERT( nuls, toats[ natt ] ); 
/* remove atoms already processed from nuls */ 

UTL_SET_DIFF_INPLACE( nuls, allats, nuls ); 
UTL_SET_DIFF_INPLACE( nuls, endatms, nuls ); 
15 /* identifying any atoms that terminate this aggregate */ 
atid = -1; 

while ((atid = UTL_SET_NEXT( nuls, atid )) > = 0 ) { 
/* skipping monovalent atoms */ / 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondCount, &ntoats ) )) goto 

2Ch, error; 
[n if (ntoats > 1) { 

S if (!(b = DB_CT_UTL_GET_BONDID( ct, atid, aggid ) )) goto error; 

m if (!DB_CT_GET_BOND_ATTR( ct, b, CtBondlsInRing, &inRing) 

if] 1 1 !DB_CT_GET_BOND_ATTR( ct, b, CtBondType, &bType ) ) goto 

25p error; 

i if (linRing && bType = = CtBondTypeSingle ) { 

m I* have an end-of-aggregate atom, mark as end atoms all other attached atoms */ 

r UTL_SET_CLEAR( end_cands ); 

Q if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondCount, &ntoats 

3Q£ ) )) goto error; 
O if (ntoats > 20) goto toomanyattms; 

hJ if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondTo Atoms, &toats ) )) goto 

O error; 

M for(natt=0; natt < ntoats; natt+ +)if(!a2use 1 1 UTL_SET_MEMBER(a2use,toats[natt])) 

35 

UTL_SET_INSERT( end_cands, toats[ natt ] ); 
UTL_SET_DELETE( end_cands, aggid ); 
UTL_SET_OR_INPLACE( endatms, end cands, endatms ); 

} 

40 } 
} 

UTL_SET_OR_INPLACE( aggats, nuls, aggats ); 

} 

if (UTL_SET_CARDINALITY( aggats ) < = aggcount ) break; 
45 aggcount = UTL_SET_CARDINALITY( aggats ); 

UTL_SET_OR_INPLACE( allats, aggats, allats ); 

} 

/ debugging stuff .. */ 
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/* 

sprintf( tempString, "Aggregate %d (weight = %f ):", loop, nowfact ); 
UBS_OUTPUT_MESSAGE( stdout, tempString ); 
ashow( aggats, molp ); 
5 ashow( aggats, molp ); 

*/ 

/* if no atoms added, we are done! */ 

if (UTL_SET_EMPTY( aggats )) break; 
/* record scaling factor for atoms in this aggregate */ 
10 atid = -1; 

while ((atid = UTL_SET_NEXT( aggats, atid )) > = 0 ) { 
vals[ atid-1 ] = nowfact; 

} 

UTL_SET_OR_INPLACE( allats, aggats, allats ); 
15 UTL_SET_CLEAR( aggats ); 

UTL_SET_CLEAR( endatms ); 
aggcount = 0; 
nowfact *= factor; 
loop+ + ; 

ansr = vals; 

5l cleanup: 

IS error: 

25£ if (aggats) UTL_SET_DESTROY( aggats ); 
h if (allats) UTL_SET_DESTROY( allats ); 

£ if (endatms) UTL_SET_DESTROY( endatms ); 

T if (end_cands) UTL_SET_DESTROY( end_cands ); 

p if (nuls) UTL_SETDESTROY( nuls ); 
3Qjr return( ansr ); 

fy toomanyattms: 

□ fprintf( stderr, "More than twenty atoms attached to some atom in this structure. \n" ); 

U goto error; 

35 } 

static char *fhex field = NIL; 
static int field Jength = 0; 

40 char *CT_FIELD2HEX( double *field, int size ) 
/ 

/* maps field to a hex string coarsely representing the field - 
45 caller must NOT free this string! */ 

{ i 

char *f; S 
int i, j, fd; 
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static double cutoff[ 16] = {9999., 0., 2., 4., 6., 8., 10., 12., 
14., 16., J8., 20., 22., 24., 26., 30. }; 

if ( size ! = fieldjength) { 
5 if (fhex_field) UTL_MEM_FREE( fhex_field ); 

if (!(fhex_field = UTL_MEM_ALLOC( sizeof( char) * (size+ 1) ) )) return NIL; 
fieldjength = size; 

} 

for (f = fhex field, j = 0; j < size; j + + , f+ + ) { 
10 for ( i = 1, fd = FALSE; i < 16; i+ + ) if (field[ j ] < = cutoff! i ]) { 

fd = TRUE; 
break; 

} 

if(!fd){ 

15 fprintf( stderr, "Illegal steric field value set to missing.\n" ); 

i = 0; 

} 

sprintf(f, "%.lx", i ); t 

} ' . 1 

20 D *f = '\0'; 

^ return fhex_field; 

i= double TOP_GET_ATOM_VDW_RADIUS( struct CtConnectionTable *ct, int nat, double *epsnow ) 
25^ / * 


/* hard coded to assign classical TAFF VDW properties */ 

b i 

30jr int sybat; 

q char *sybname; 

ry static double a_eps[34] = { 

□ 0.000, 0. 107, 0. 107, 0. 107, 0. 107, 

U. 0.095, 0.095, 0.095, 0. 1 16, 0. 116, /* 5 - 9 */ 

35 0.095, 0.314, 0.095, 0.042, 0.434, 

0.314, 0.109, 0.623, 0.314, 0.095, /* 15 - 19 */ 

0.000, 0.400, 0.400, 0.600, 0.400, 

0.100, 0.000, 0.042, 0.095, 0.314, 

0.314, 0.095, 0.1,16, 0.107 }; 

40 

static double rval[34] = { 

0.000, 1.700, 1.700, 1.700, 1.700, 
1.550, 1.550, 1.550; 1.520, 1.520, /* 5 - 9 *1 

1.800, 1.550, 1.800, 1.500, 1.850, 
45 1.750, 1.470, 1.980, 1.800, 1.550, /* 15 - 19 */ 

0.000, 1.200, 1.200, 1.200, 1.200, 
1.341, 0.000, 2.100, 1.550, 1.800, 
1.800, 1.550, 1.520, 1.700 }; 
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if (!(DB_EX_ELEM_TO^SYB_ATOM_TYPE( ct, nat, &sybname, &sybat ))) { 

fprintf( stderr, "Warning: Atom type not found for atom ID %d.\n", nat ); 
5 epsnow = 0.0; 

return 0.0; 

} 

if ( sybat < 0 1 1 sybat > 33 ) 
{ 

10 *epsnow = 0.0; 

return 0.0; 

} . [ 

*epsnow = a^eps[sybat];| 
return rval [sybat]; 



#ifO 



switch (sybat) { 

f 


case 1: /* c.3 */ 


20fi 

case 2: /* c.2 */ 



case 3: /* car */ 



case 4: /* c.l */ 



case 33: /* c+ */ 


IS 

*epsnow = 

.107; return( 1.7 ); 


case 5: /* n.3 */ 


case 6: /* n.2 */ 


ffl 

case 7: /* n.l */ 



case 11: /* n.ar */ 



case 19: /* n.lp3 */ 


30| 

case 28: /* n.am */ 



case 31: /* N+ */ 



*epsnow = 

.095;ireturn( 1.55 ); 


case 8: /* o.3 */ 



case 9: /* o.2 */ 


35 

case 32: /* o.ar */ 



*epsnow = 

.116; return( 1.52 ); 


case 10: /* s.3 */ 



case 12: /* p.3 */ 



case 18: /* s.2 */ 


40 

case 29: /* S.O */ 
case 30: /* s.o2 */ 



*epsnow = 

.314; return( 1.8 ); 


case 13: /* H */ 



*epsnow = 

.042; return( 1.5 ); 

45 

case 14: /* Br */ 



*epsnow = 

.434;lreturn( 1.85 ); 


case 15: /* CI */ 



epsnow = 

.314; Teturn( 1.75 ); 
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fprintf( stderr, "WARNING: Assigning no steric field from atom type; %s\n", sybname 
*epsnow = 0.0; return( 0.0 ); 


case 16: / F / 

epsnow = .109; return( 1.47 ); 

case 17:/ I*/ 

epsnow = .623;. return( 1.98 ); 
case 21: /* Na / 
case 22: /* K */ 
case 24: /* Li */ 

*epsnow = 0.4; retum( 1 .2 ); 
case 23: /* Ca */ 

*epsnow = 0.6; return( 1.2 ); 
case 25: /* Al */ 

*epsnow = 0.1; feturn( 1.341 ); 
case 27: /* Si */ 

•epsnow = 0.042; return(2.1 ); 

default: 

); 

} 

#endif 
} 

int TOP_REFLECT_COO( double *coo, set_ptr atms, int npt, int *aplane ) 

/ * 

= = = = = = = = = = = = = = = = = = = = = == = = = = = = = = = = = = = =*/ 

/* reflects atms through the plane defined by the atoms whose IDs are in aplane, by modifying values 
in coo */ 

{ 

double cent[3], eval[3], evec[3][3], mat[3][3], x, xsq, xy, xz, 

y, ysq, yz, z, zsq, *cx, *cy, *cz, 1, m, n, d, *xyz, h; 
int na, nrot, elem; 

/* Now perform the sums to determine the parameters of the plane */ 
/* equation. */ 
x = xsq = y = ysq = z = zsq = xy = xz = yz = 0.0; 
for (na = 0; na < npt; na+ + ) { 

cx = coo + 3 * ( aplane[ na ] - 1 ); 

x += *cx; 

xsq + = (*cx) * (*cx); 

cy = cx + 1; 

y += *cy; 

ysq + = (*cy) * (*cy); 

cz = cy + 1; 

z += *cz; 

zsq + = (*cz) * (*cz); 

xy + = (*cx) * (*cy); 
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xz + = (*cx) * (*cz); 
yz + = (*cy) * (*cz); 

} 

cent[0] = x / (double) npt; 
cent[l] = y / (double) npt; 
cent[2] = z / (double) npt; 

mat[0][0] = xsq - x * cent[0]; 

mat[0][l] = xy - x * cent[l]; 

mat[0][2] = xz - x * cent[2]; 

mat[l][0] = xy - y * cent[0]; 

mat[l][l] = ysq - y * cent[l]; 

mat[l][2] = yz - y * cent[2]; 

mat[2][0] = xz - z * cent[0]; 

mat[2][l] = yz - z * cent[lj; 

mat [2] [2] = zsq - z * cent[2]; i 

/* calculate the plane */ 

if (!UTL_GEOM_SYMM_EIGENSYS ((double *)mat, 3, eval, (double *) evec, &nrot)) goto error; 

1 = evec[0][0]; . 
m = evec[l][0]; 
n = evec[2][0]; 

d = (1 * cent[0] + m * cent[l]^ + n * cent[2]); 

/* perform reflection on the input coordinate sets */ 
elem = -1; 

while ( (elem = UTL_SET_NEXT( atms, elem)) > = 0 ) { 
xyz = coo + (elem - 1) * 3; 
h = 1 * xyz[0] 4- m * xyz[l] + n * xyz[2] - d; 
xyz[0] -= 2.0 * 1 * h; 
xyz[l] -= 2.0 * m * h; 
xyz[2] -= 2.0 * n * h; , 

} 

return TRUE; j 
error: 
return FALSE; 

} 

static int reflectAtoms( double *coo, int nAtoms, int npt, int *aplane ) 

/ * 


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* reflects atms through the plane defined by the atoms whose indexes (base 0 )are in aplane, by 
modifying values in coo */ 

{ 
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double cent[3], eval[3], evec[3][3], mat[3][3], x, xsq, xy, xz, 

y, ysq, yz, z, zsq, *cx, *cy, *cz, 1, m, n, d, *xyz, h; 
int na, nrot, elem; 
int dn; 

if( npt > = 3 ) . 

dn = findDirectionalNeighbors(g_ct, aplane[l], aplane[0], aplane[2] ); 

else 

return FALSE; 


/* Now perform the sums to determine the parameters of the plane */ 
/* equation. */ 
x = xsq = y = ysq = z = zsq = xy = xz = yz = 0.0; 
for (na = 0; na < npt; na+ + ) { 
15 cx = coo + 3 * ( aplane[ na ] ); 

x + = *cx; 
xsq + = (*cx) * (*cx); 
cy = cx + 1; 

y += *cy; [ 
20h ysq + = (*cy) * (*cy); 

[q cz = cy + 1; ■ 

m z + = *cz; 

fit zsq + = (*cz) * (*cz); 

xy + = (*cx) * (*cy); 
25^ xz + = (*cx) * (*cz); 

j~ yz + = (*cy) * (*cz); 

i } 

5 cent[0] = x / (double) npt; 

Q cent[l] = y / (double) npt; 1 

3GJ: cent[2] = z / (double) npt; 

m mat[0][0] = xsq - x * cent[0]; 

Q mat[0][l] = xy - x * cent[l]; 

h± mat[0][2] = xz - x * cent[2]; 

35 mat[l][0] = xy - y * cent[0]; 

mat[l][l] = ysq - y * cent[l]; 

mat[l][2] = yz - y * cent[2]; 

mat[2][0] = xz - z * cent[0]; ; 

mat[2][l] = yz - z * cent[lj; 

40 mat[2][2] = zsq - z * cent[2]; 

/* calculate the plane */ ' 
if (!UTL_GEOM_SYMM_EIGENSYS ((double *)mat, 3, eval, (double *) evec, &nrot)) goto error; 

45 1 = evec[0][0]; 

m = evec[l][0]; ' 
n = evec[2][0]; 

d = 0 cent[0] + m * cent[l] + n * cent[2]); 
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/ perform reflection on the input coordinate sets */ 
elem = -1; i 
for ( elem = 0; elem < nAtoms; elem++ ) 

{ ■ 

5 if ( dn[elem] < = 0 ) 

continue; 
xyz = coo + (elem * 3); 

h = 1 * xyz[0] + m * xyz[l] + n * xyz[2] - d; 
xyz[0] -= 2.0 * 1 * h; 
10 xyz[l] -= 2.0 * m * h; 

xyz[2] -= 2.0 * n * h; 

} 

if ( dn ) free((char *) dn ); 
15 return TRUE; 

error: 

if ( dn ) free((char *) dn ); 
return FALSE; 

} ■ \ 

2CU 

% static int setTorsion(double *coo,j int nAtoms, int al, int a2, int a3, int a4, double value ) 

i /* rotates atoms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 

S */ • 

m i 

25^= double angle, delta, matrix[3][3]; 

% int elem; 
£ int *dn; 

f% dn = findDirectionalNeighbors(g_ct, a3, a2, -1 ); 

30ji angle = UTL_GEOM_TAU( coo+(al*3), coo+(a2*3), coo+(a3*3), coo+(a4*3) ); 

S if (UTL_ERROR_IS_SET0) UTL ERROR CLEAR0; 

fa if (angle < 0.0) angle + = 360^0; 

u while (value < 0.0) 
35 value + = 360.0; 

while (value > 360.0) 
value -= 360.0; 

i 

40 delta = angle - value; 

UTL_GEOM_MFORM( coo+(a2*3), coo+(a3*3), delta, matrix ); 
for ( elem = 0; elem < nAtoms; elem+ + ) 

{ 

if ( dn[elem] > 0 ) 

45 UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 

} 

free((char *) dn ); } 
return 1; 
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} 


static Lnt setRootTorsion(double *coo, int nAtoms, int a2, int a3, int a4, double value ) 
/ rotates atoms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 
5 / 
{ 

double angle, delta, matrix[3][3]; 
double cord 1 [3]; 

double cord2[3]; 
10 int elem; 

cordl[0] = -1.802; 
cordl[l] = 1.666; 
cordl[2] = 0.0; 

15 

if ( q_coremode_align ) 

cord2[0] = -2.004; 

else 

cord2[0] = -0.504; 

20^ 

S cord2[l] = 1.424; 
g cord2[2] = 0.0; 

m angle = UTL_GEOM_TAU( cord2, coo+(a2*3), coo+(a3*3), coo+(a4*3) ); 
25]= if (UTL_ERROR_IS_SET0) UTL_ERROR_CLEAR(); 
J if (angle < 0.0) angle + = 360.0; 

E while (value < 0.0) 
rt value + = 360.0; 

•sssr 7 

3Qp 

□ while (value > 360.0) 
fy value -= 360.0; 

N= delta = angle - value; 
35 #ifdef DEBUGDETAIL 
if ( q_debugfp ) 

fprintf(q_debugfp, "# root value: %8.31f %6.01f %8.31f\n", angle, value, delta ); 

#endif 

UTL_GEOM_MFORM( coo+(a2*3), coo+(a3*3), delta, matrix ); 
40 elem = -1; 

for ( elem = 0; elem < nAtoms; elem+ + ) 

UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 
return 1; 

} 

45 

static int setBaseTorsion(double *coo, int nAtoms, int a3, int a4, double value ) 

/* rotates atoms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 

*/ 

138 


{ 

double angle, delta, matrix [3] [3]; 
double cord 1 [3]; 

double cord2[3]; 
5 int elem; 

1; 

cordl[0] = -1.802; 
cordl[l] = 1.666; 
cordl[2] = 0.0; 
10 cord2[0] = -0.504; 
cord2[l] = 1.424; 
cord2[2] = 0.0; 

angle = UTL_GEOM_TAU( cprdl, cord2, coo+(a3*3), coo+(a4*3) ); 
15 if (UTL_ERROR_IS_SET0) UTL_ERROR_CLEAR0; 

if (angle < 0.0) angle + = 360.0; 

j 

while (value < 0.0) 
value + = 360.0; { 
2(U I 
J while (value > 360.0) 
?S value -= 360.0; 

\n delta = angle - value; 

25^ UTL_GEOM_MFORM( cord2, coo+(a3*3), delta, matrix ); 

^ elem = -1; 

fh for ( elem = 0; elem < nAtoms; elem+ + ) 
r UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 

=== return 1; 

30| } 

m int TOP_SET_TORSION( double' *coo, set_ptr atms, int al, int a2, int a3, int a4, double value ) 

□ / * 

35 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* rotates atms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 

*/ i 

{ 

40 double angle, delta, matrix[3][3]; 
int elem; 

angle = UTL_GEOM_TAU( coo+(al-l)*3, coo+(a2-l)*3, coo+(a3-l)*3, coo+(a4-l)*3 ); 
if (UTL_ERROR_IS_SET0) goto error; 
45 if (angle < 0.0) angle + = 360.0; 

while (value < 0.0) 
value + = 360.0; 
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while (value > 360.0) 
value -= 360.0; \ 

delta = angle - value; 
5 UTL_GEOM_MFORM( coo+(a2-l)*3, coo+(a3-l)*3, delta, matrix ); 
el em = -1; 

while ((elem = UTL_SET_NEXT( atms, elem)) > 0) 
UTL_GEOM_ROTATE( coo+(a3-l)*3, matrix, coo+(elem-l)*3 ); 

10 return( TRUE ); 

error: 

return( FALSE ); 
} 

15 

int TOP_ALIGN_MOL( double *coo, int natms, int al, int a2, int a3 ) 

/ * 

20q /* rotates and translates all coordinates so that al is at origin, a2 lies along x axis, and a3 lies in the xy 
plane */ 

ry double matrix[3][3], tv[3], u[3], *c; 

■ §=§ int i, nc; 

£ if (!UTL_GEOM_ALIGN(coo+(al-l)*3, coo+(a2-l)*3, coo+(al-l)*3, coo+(a3-l)*3, matrix)) goto 

m error; 

e if ( q_coremode_align ) 

Q c = coo+(a2-l)*3; 

30E else i 

O c = coo+(al-l)*3; 

m for (i = 0; i < 3; i+ + , C++) { 

O u[i] = *c ; j 

M tv[i] = -u[i]; ! 

35 } 

for (nc = 0, c = coo; nc < natms; nc+ + ) { 
UTL_GEOM_ROTATE( ii, matrix, c); 
for (i = 0; i < 3; i++, C++) *c += tv[ i ]; 
40 } J 

return TRUE; 
error: ', 
return FALSE; \ 

} 

45 /* New code Sept, 2000 */ 
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FindBreakPoints - takes in a ct and returns an array the size of the number 
of bonds in the ct. Each cell indicates true or false if this is a break point bond 

break points: are single bonds with at least N heavy atoms on each side of the 
5 attachment, not in a ring, and optionally they can be terminal atoms 

int minHev - optional argument which forces at least N hev atoms for this to 
be a breakpoint bond. 

10 

int termflag - if true the heavy atoms can be terminal heavy atoms, for example Fl, Br, CI 
Author: Rob Jilek Sept, 2000 ; 
15 */ 

static Split *FindBreakPoints(CtConnectionTable *ct, int minHev, int termflag, int createFrags ) 

{ 

int *bdata; 
int *singleBonds; 
int *bptr; 
CtBond *bondp; 
int idx; 

int *rbl, *rb2; 
int *atomMask; 
int hevCnt; 4 
int hevDiff; 
Split *S; 


int bent; 

CtBondTypeDef bondType; 
CtSimpleBondTypeDef sihipleTypes; 

#ifdef DEBUG_VALID_B ] 

fprintf(stdout,"new breakpoints minHev: %d Allow term: %s\n", 
minHev, (termflag) ? "Yes" : "No" ); 

#endif 

S = (Split *) 0; 

if ( !ct 1 1 !ct->bondCount ) 
return S; 

45 atomMask - createAtomMask(ct, termflag, AhevCnt); 

if ( !q_coremode && qs && q_h evDi ff > = 0 ) 
{ 
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hevDiff = abs(hevCnt - qs->numHev); 

if ( hevDiff > q_hevDiff ) 

{ 

if ( createFrags ) 

tjiltered+ + ; 
free((char^*) atomMask ); 
return S; 

} 

} 

if ( hevCnt < (minHev*2) ) 
{ 

free((char *) atomMask ); 
return S; 

} 

bdata = (int *) calloc(ct-^bondCount, sizeof(int) ); 
singleBonds = (int *);calloc(ct->bondCount, sizeof(int) ); 
S = (Split *) calloc(l, sizeof(Split) ); 

I 

for ( idx = 0, bondp = ct-> bonds; 

idx < ct->bondCount; 
idx+ + , bondp+ + ) 


{ 


\ 

if ( ! ( bondp- > simpleBondType == CtSimpleBondTypeSingle 1 1 

bondp- > simpleBondType == CtSimpleBondTypeNotSimple ) ) 
continue; ; /* must be single, check NotSimple next. */ 


if ( bondp- > simpleBondType == CtSimpleBondTypeNotSimple ) 

{ 

bondType. = DB CT_GET_BOND_TYPE(ct, STDJD(idx), &bcnt, 
&simpleTypes ); ' 

if ( bondType != CtBondTypeS ingle ) 
continue; 

} ; 

if ( AB JN_RINQ(bondp) ) 

continue; i 
singleBonds[idx] = 1; 


)) 


if (minHev > 0&;& !validBreakPoint(ct, idx, atomMask, minHev, termflag, &rbl, &rb2 

continue; { 
if ( createFrags ) | 

addSplit2(idx, rbl, rb2 ); 


else 

{ 


free((char *) rbl ); 
free((char *) rb2 ); 
S->s2cntrf +; 
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bdata[idx] = 1; 

} 


/* found a good one / 


if ( createFrags && ( q_do3piece 1 1 (LdoSubset ) && hevCnt > = (minHev*3) ) 
makeSplit3(ct, atomMask, g_split2, gsplitcnt, minHev ); 

if ( createFrags ) ; 

S-> frags = createUniqFrags(ct-> atomCount, g_split2, g_splitcnt, g_split3, g_split3Cnt, 

atomMask, 

&(S->numFrags) ); 
S->numHev = hevCnt; 'j 


#ifdef DEBUG_VALID_BXX 

fprintf(stdout, "bonds (baseO): "); 
for ( idx = 0; 

idx < ct->bondCount; 
idx++ ) 

{ 

if ( bdata[idx] ) 

fprintf(stdout,"%d idx ); 

} 

fprintf(stdout, w \n"); 
#endif j 

i 

if ( createFrags ) } 

{ \ 

S->s2 = g_split2; 
S->s3 = g_split3; 
S->s2cnt = gsplitcnt; 
S->s3cnt = g_sp'lit3Cnt; 

} ; 

S->bondCount = ct- > bondCount; 
S-> atomCount = ct- > atomCount; 
S->bondMask = bdata; 
S-> atomMask = atomMask; 
S->singleBonds = singleBonds; 
S->aromSets = (AromSet *) 0; 

g_split2 = (split2 *) 0; 
g_split3 = (split3 *) 0; ; 

g_splitcnt = gsplitalloc = g_split3Cnt = g_split3Alloc = 0; 


return S; 
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static void freeSplit(Split *s) 
{ i 

int i; 

AromSet *aset; 
5 i 
if ( !s ) ; 
return; 

freeSplit2(s- > s2, s- > s2cnt); 
freeSplit3(s->s3, s->s3cnt); 
10 freeFrags(s- > ftags, s- > numFrags); 

if ( s->bondMask ) 

free((char *) s->bondMask ); 
if ( s- > atomMask ) 

free((char *) s- > atomMask ); 
15 if ( s- > singleBonds ) 

free((char *) s-> singleBonds ); 

if ( s->featureMask ) 

free((char *) s->featureMask ); 
20U if ( s- > aromMask ) 

J free((char *) s-> aromMask); 

2= if ( s->aromSets ) ; 

S { 

in for ( i = 0, aset = s->aromSets; i < s->numArom; i+ + , aset++ ) 

251= free((char *) aset- > atoms); 

free((char *) s->aromSets ); 

free((char *) s); 


30j 


35 


} 


static void freeSpl it2(spl it2 *s2, int cnt ) 

{ »' 

split2 *sptr; 
int i; 

if ( !s2 ) j 
return; 


for ( i = 0, sptr = s2; i k cnt; sptr + +, i+ + ) 
40 { 

free((char *) sptr->bl); 
free((char *) sptr- > b2); 

free((char *) s2); ' 


45 } 


static void freeSplit3(split3 *s3, int cnt ) 
{ 
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split3 sptr; j 
int i; 

if ( !s3 ) 
5 return; 

for ( i = 0, sptr = s3; i <C cnt; sptr+ + , i+ + ) 

{ ; 

free((char *) sptr->bl); 
10 free((char *) sptr- > b2); 

free((char *) sptr->b3); 
if(sptr->b4) \ 

firee((char *) sptr->b4 ); 

} 

15 firee((char *) s3); 

} 

static void freeFrags(Frag *f, int cnt ) 
{ 

20== Frag *fptr; 

SI for ( i = 0, fptr = f; i < cnt; i+ +, fptr+ + ) 

m { i 

25^ #ifdef USE_HEX 
j= if (Q)tr->topHex ) 

m free(fptr->topHex ); 

T if (fptr->topInt) 

□ free((char *) fptr->topInt ); 

30£ #endif 
O #ifdef STDREGION 
m if (fptr->stdField) 

O free((char *) fptr- > stdField ); 

M #endif [ 
35 if (fptr->hexDiff ) 

free((char *) fptr->hexDiff ); 
if (fptr->featureDiff ) 

free((char *) fptr->featureDiff); 
if (fptr->ct) 

40 DB_CT_DELETE_CT(fptr- > ct); 

else if ( fptr- > cords ) 

free((char *) fptr- > cords); /* if the ct exists, then coords is a pointer into the 

ct's coordinates */ 

if ( fptr->origMapping ) 
45 free((char *) fptr- > or igMapping ) ; 

if (fptr- > cent) ; 

free((char *) Q)tr->cent); 
if (fptr->AtWts ) 
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free((char *) fptr->AtWts ); 
for (j = 0; j < max_regions; j + + ) 

{ ; ' 

if ( fptr- > qtffj] && fptr- > qtf[j] ! = fptr- > topField ) 
. free((char *) fptr- > qtflj] ); 

} 

if (fptr- > topField) 

free((char?*) fptr- > topField ); 

} 

free((char *) f ); 

static void freeFragCts(Split *S) i 
{ 

Frag *fptr; 
int i j; 

double *coords; 

for ( i = 0, fptr = S->ftags; i < S->numFrags; i+ + , fptr++ ) 

{ I 

if ( fptr- > ct && fptr- > cords ) 

{ • 

coords = (double *) malloc(fptr->ct->atomCount * sizeof(double) * 3 ); 
memcpy((char *) coords, fptr- > cords, sizeof(double) * fptr->ct->atomCount 
*3); { 

fptr- > cords = coords; 

DB_CT_DELETE_CT(fptr- > ct); 
Q)tr->ct = (struct CtConnectionTable *) 0; 

} 

} 

} 

static int freeStrMap(Split *S) 
{ 

split2 *s2; 
split3 *s3; 
int i; 

#ifdef NO_STRMAP 
return -1; 

#else 

ifds) ; 

return 0; 

for ( i = 0, s2 = S- > s2;: i < S->s2cnt; i+ + , s2++ ) 
{ 

if (s2->strMap) 

{ : 

free((char ) s2->strMap ); 
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s2->strMap = (int *) 0; 

} 

} 

S->alloc2Map = 0; 

5 

for ( i = 0, s3 = S->s3; i < S->s3cnt; i+ + , s3+ + ) 
{ 

if ( s3->strMap ) 
{ 

10 free((char *) s3- > strMap ); 

s3->strMap = (int *) 0; 

} 

} 

S->alloc3Map = 0; 

15 #endif 
} 

static int addSplit2(int bondld, int *bl, int *b2 ) 
{ 

20u split2 *s; 


ru 


if ( gsplitcnt > = gsplitalloc ) 

{ 

if ( g_split2 && g_splitalloc ) 
25| { 

~g g_split2 = (split2 *) realloc((char *) g_split2, g_splitalloc * 2 * sizeof(split2) ); 

g_splitalloc *= 2; 


} 

else 


30; { 


a } 
C } 

35 s = g_split2 + gsplitcnt; 

s-> bondld = bondld; 
s->bl = bl; 
s->b2 = b2; 
#ifhdef NOSTRMAP . 
40 s-> strMap = (int *) 0; 

#endif 

g_splitcnt++; 

} 

45 static int printBondArray(int atomCnt, int *b) 

{ 

int i; 


gsplitalloc = 3; 

g_split2 = (split2 *) calloc(sizeof(split2), g_splitalloc ); 
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for ( i = 0; i < atomCnt; i+ + ) 
{ 

fprintf(stdout,"%2d ", b[i] ); 

} 

fprintf(stdout,"\n"); 
) , 

static int addSplit3(int atomCnt, -int bondl, int bond2, int *bl, int *b2, int *b3, int firstBase, int 
secondBase ) 

{ 

split3 *s; 

if ( g_split3Cnt > = g_split3AHoc) 
{ 

if ( g_split3 && g_split3Alloc ) 
{ 

g_split3 = (split3 *) realloc((char *) g_split3, g_split3Alloc * 2 * sizeof(split3) 


); 


} 

else 

{ 


g_split3Alloc *= 2; 


g_split3Alloc = 2; 

g_split3 = (split3 *) calloc(sizeof(split3), g_split3Alloc ); 


} 
} 

s ~ g_split3 + g_split3Cnt; 
s->bondl = bondl; 
s->bond2 = bond2; 
#ifxidef NOSTRMAP 

s->strMap = (int *) 0; 

#endif 

s->bl = (int *) malloc(sizeof(int) * atomCnt ); 
s->b2 = (int *) malloc(sizeof(int) * atomCnt ); 
s->b3 = (int *) malloc(sizeof(int) * atomCnt ); 
memcpy((char *) s->bl, j(char *) bl, sizeof(int) * atomCnt ); 
memcpy((char *) s->b2, -(char *) b2, sizeof(int) * atomCnt ); 
memcpy((char *) s->b3,=(char *) b3, sizeof(int) * atomCnt ); 

s->b4 = (int *) malloc(sizeof(int) * atomCnt ); 
memcpy((char *) s->b4, ;(char *) bl, sizeof(int) * atomCnt ); 
if ( firstBase > = 0 && secondBase > = 0 ) 
{ 

s->b4[firstBase] = 1; 

s->b4[secondBase] = -1; /* this is the base for query */ 

} 

g_split3Cnt+ + ; 


i 
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i 

} \ 

/* returns a true value if the atom* arrays overlap and the anchor is contained 
within bl. It returns the index + 1 (base 1) indexed into bl 

*/ 

static int atomsOverlap(int atomcnt, int *bl, int *b2) 

{ 

int i; 

int overlap = 0; 

for ( i = 0; i < atomcnt; i+ + ) 

{ ! 

if (bl[i] == 1 &&b2[i] ) 
return i+1; 

} 

return 0; \ 

} [ 

static Frag *createUniqFrags(int atomCnt, split2 *s2, int nums2, split3 *s3, int nums3, int *atomMask, 
int *r_numFrags ) 

{ j 

int i; 

split2 *s2ptr; , 
split3 *s3ptr; 
Frag *fragHead; 
int no2p; 

gfragHead = (Frag *) 0; 
gfiragCnt = gfragAlloc = 0; 

if ( q^coremode ==0)5 

g_fragAlloc = (nums2*2) 4- (nums3*2); 

else 

gfragAlloc = nums3*2; 

if ( gJragAlloc > 0 ) 

g_firagHead = (Frag *) calloc(sizeof(Frag), g_fragAlloc ); 


no2p = 0; 

if ( !q_coremode 1 1 qmode ) 

{ 

for ( i = 0, s2ptr ;= s2; i < nums2; i + +, s2ptr+ + ) 
{ 

s2ptr->fragl = createFrag(atomCnt, s2ptr->bl, atomMask ? 0 ); 
s2ptr->firag2 = createFrag(atomCnt, s2ptr->b2, atomMask, 0 ); 

} 

} ! 
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no2p = gfragCnt; 
if ( q_coremode = = 0 ) 

{ 

for ( i = 0, s3ptr!= s3; i < nums3; i+ + , s3ptr+ + ) 

5 { ■ 

s3ptr->fragl = createFrag(atomCnt, s3ptr->bl, atomMask, 0) 
s3ptr->firag2 = createFrag(atomCnt, s3ptr->b2, atomMask, 1 ) 
s3ptr->frag3 = createFrag(atomCnt, s3ptr->b3, atomMask, 1 ) 
s3ptr->frag4 = createFrag(atomCnt, s3ptr->b4, atomMask, 0) 
10 } i 

} 

else 

{ ! 

for ( i = 0, s3ptr = s3; i < nums3; i++, s3ptr+ + ) 
15 { 

s3ptr->fragl = createFrag(atomCnt, s3ptr->bl, atomMask, 0 ); /* bl and b4 

are the center pieces */ 

s3ptr->frag2 = createFrag(atomCnt, s3ptr->b4, atomMask, 0); 

} 

2CU } 

if ( q_debugfp ) ; 

fprintf(q_debugfp, "# There are %d uniq 2D fragments and %d 3D\n", no2p, g_fragCnt 

- no2p ); 


25*J 

% tot_frags + = nums2 * 2 j+ nums3 * 3; 

tot_uniq_frags += g_fragCnt; 
^ compounds+ + ; 

30j fragHead = gfragHead; 

2j *r_numFrags = gftagCnt; 


} 


gfragHead = (Frag *) 0; 
g_fragCnt = g fragAlloc = 0; 

return fragHead; 


int dump _frag_stats(void) 
40 { 

fprintf(stderr,"AVG uniq frags: %8.31f AVG frags: %8.31f # structures for which fragments were 
built : %d\n tt , { 

(double) ((double)' tot_uniq_frags / (double) compounds), 
(double) ((double);; tot_frags / (double) compounds), 
45 compounds); 

} 

static int masksMatch(int cnt, int j*ml, int *m2 ) 
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{ 

int re; 

re = !memcmp((char *) ml, (char *) m2, sizeof(int) * cnt ); 
5 return rc; 

} 

static int createFrag(int atomCnt, int *atoms, int *atomMask, int checkDup ) 

{ ! 
10 int i, j, found; 

Frag *curr; 

int numAtoms, hev Atoms; 
int baseAtom; 

15 hevAtoms = hevCount(atomCnt, atoms, atomMask, &numAtoms ); 

for ( i = 0, baseAtom = -1; i < atomCnt; i + + ) 

{ 

if ( atoms[i] == -1 ) 

{ ; : 

2CU baseAtomv= i; 

;f break;' ■ 

I } } 

1= if (baseAtom = = -1 ) 

jt fprintf(stderr,"base atom not found\n M ); 

S for ( i = 0; i < atomCnt; i+ + ) 
T 5 fprintf(stderr,"%d atoms[i] ); 

q fprintf(stderr, M \n"); 

30^ return -1; 

E ) 

□ if ( checkDup ) 

35 for ( j = 0, curr = gfragHead; j < gfragCnt; j+ + , curr-h + ) 

{ 

if ( curr- > baseAtom = = baseAtom && curr- > atomCnt = = numAtoms && 
curr- > hevCnt = = hevAtoms && masksMatch (atomCnt, curr- > atoms, 
atoms) ) j 
40 { 

return curr- > id; 

} 

} i 
} i 

45 if ( g fragCnt > = gfragAIloc ) 

{ 

#ifO 

fprintf(stderr,"%d %d\n", g fragCnt, gfragAIloc ); 
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#endif 


); 


fflush(stderr); 

f 

if ( g_fragHead && g_fragAlloc ) 
{ 

g_fragAlloc *= 2; 

gfiragHead = (Frag *) realloc((char *) gJtagHead, gJragAlloc * sizeof(Frag) 


} 

else 

10 { 


g_fragAlloc = 20; 

gfiragHead = (Frag *) calloc(sizeof(Frag), gJragAlloc ); 


} 
} 

15 curr = gfragHead + gfragCnt; 

memset((char *) curr, *\0\ sizeof(Frag) ); 

curr->baseAtom = baseAtom; 

curr->atomCnt = numAtoms; 
2CL curr->hevCnt = hev Atoms; 

*f curr- > atoms = atoms; I 

2f curr- > id = gfragCnt; 

21 curr->aromCnt = -1; : /* Indicate not computed */ 

25t! g_firagCnt++; 

4S5 


} 


return curr- > id; 


3(C static int hevCount(int atomcnt, int *b, int *atomMask, int *r_numAtoms ) 

O * f 
n 5 int hevCnt; i 

int numAtoms; 

y, int i; * . 

35 

for ( i = hevCnt = numAtoms = 0; i < atomcnt; i++ ) 

{ ; 
if(b[ij) 

40 numAtoms + + ; 

if ( atom^ask[i] ) 
hevCnt+ + ; 

} 

} 

45 *r_numAtoms = numAtoms; 

return hevCnt; 

} \ 
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J; 
I 


static int makeSpIit3(CtConnectioaTabIe *ct, int *atomMask, split2 *sall, int cnt, int minHev ) 
{ 

int i, j, k; . 

split2 *sl, *s2; ? 
CtBond *bl, *b2; 
int *inBoth; ^ 
int *subsetl; 

int *subset2; I 

int *subset3; 

int Remaining; 

int overlap 1, overlap2; 

int numAtoms; i 

int numHev; 

int firstBase, secondBase; 

for (i = 0; i < cnt; i+ + ) 

{ ! 
si = sail + i; 

bl = ct-> bonds !+ sl->bondId; 
for (j = i+1; j < cnt; j + + ) 

{ 

s2 = sail + j; 

b2 = ct-> bonds + s2->bondId; 
firstBase = secondBase = -1; 

overlapl = atomsOverlap(ct-> atomCount, sl->bl, s2->bl); 
overlap2 = atomsOverlap(ct- > atomCount, sl->b2, s2->bl); 
if ( ioverlapl 1 1 !overlap2 ) 

{ ! ' 

overlap 1 = atomsOverlap(ct- > atomCount, s 1 - > b 1 , s2- > b2); 
overlap2 = atomsOverlap(ct- > atomCount, sl->b2, s2->b2); 
if ( Ioverlapl 1 1 !overlap2 ) 

{j 

! continue; 

} 

inBoth = s2->b2; 
subset3 = s2->bl; 

} 

else 

{ i 

inBoth = s2->bl; 
subset3 = s2->b2; 

} 

if ( inBoth[overlapl - 1] < inBoth[overlap2 -1] ) 

{ 

subset2 = sl->b2; 
remaining = sl->bl; 

} i 
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else / 

{ : 

subset2 = sl->bl; 
remaining = sl->b2; 

} \ 

subsetl = (int *) calloc(sizeof(int), ct- > atomCount ); 

#ifdef SPLIT_DEBUG 

if ( q_debugfp ) 

fbrintf(q_debugfp,"# "); 

#endif 

for ( k = 0; k < ct-> atomCount; k+ + ) 

{ : 

if ( remaining[k] && inBoth[k] ) 

subsetl [k] = remaining[k]; 
; if (inBoth[k] == -1 ) 
] secondBase = k; 

[ if ( remaining[k] = = -1 ) 
firstBase = k; 

} i 

#ifdef SPLIT_DEBUG j 

if ( q_debugfp ) 

fprintf(q_debugfp,"%d subsetl[k] ); 

#endif 

} 

#ifdef SPLITDEBUG 

if ( q_debugfp ) 
{ 

fprintf(q_debugfij,"\n"); 

for ( k = 0; k < ct-> atomCount; k++ ) 

{, 

if (inBothfk] == -1 ) 
i fprintf(q_debugfp, "# inBoth: %d\n", k ); 

} 

} 


#endif 


#ifO 


numHev = hevCount(ct-> atomCount, subsetl, atomMask, «ScnumAtoms); 
numHev 4= 2; /* subtract out the attachment atoms */ 
if ( numHev < minHev ) 

{ ; 

free((char *) subsetl); 

continue; 

} 

i 

fprintf(stdout, "3 piece set\n"); 
printBond Array(ct- > atomCount, s 1 - > b 1); 
printBond Array(ct- > atomCount, s 1 - > b2); 
printBondXrray(ct- > atomCount, s2- > b 1); 
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printBondArray(ct->atomCount, s2->b2); 
printBond Array (ct- > atomCount, subset 1); 
printBondArray(ct- > atomCount, subset2); 
printBond Array (ct- > atomCount, subset3); 

fprintf(stdout," \n"); 

#endif 

addSplit3(ct-> atomCount, sl->bondId, s2->bondId, subsetl, subset2, subset3, 

firstBase, secondBase ); 

free((char *) subsetl ); 

} 

} 

return g_split3Cnt; 

} 

static int *fmdDirectionalNeighbors(CtConnectionTable *ct, int atomldx, int terminalAtomldx, int 

termldx2 ) 

/* 

think of the arguments as: ct, to, from 
or 

from the atom (atomldx) find atoms down the paths except for the terminal atoms 

For example: C is the atom your interested in, 
and you want to find the atoms going down the paths connected to atoms 3 and 4, so you block 1 and 
2 as terminal. 



CtAtom *A; 

CtAtomBondData *bond; 

int *covered; 

int added; 

int level; 

int toAtom; 

int i, j; 

if ( atomldx < 0 1 1 atomldx > = ct-> atomCount ) 

return (int *) 0; 
if ( terminalAtomldx > = ct-> atomCount ) 

return (int *) 0; 
if ( termldx2 > = ct-> atomCount ) 

return (int *) 0; 
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A = ct-> atoms + atomldx; /* index is zero based */ 

covered = (int *) cal!oc(ct->atomCount, sizeof(int) ); 

covered [atomldx] = 1; 

if ( terminalAtomldx > = 0 ) 

covered[terminal Atomldx] = -1; /* -1 means do not cross this atom, it is the 
anchor/terminal atom */ 

if ( termldx2 > = 0 ) 

covered[termIdx2] = -1; /* -1 means do not cross this atom, it is the 
anchor/terminal atom */ 

added = 1; 

for ( level = 1; added && level < = ct->atomCount; level + + ) 
{ 

for ( i = added = 0; i < ct->atomCount; i++ ) 

{ 

if ( covered[i] = = level ) 

{ 

A = ct-> atoms + i; 

for ( j = 0, bond = A- > bond; j < A- > bondCount; j + + , bond+ + 

) 

{ 

toAtom = bond->toAtom; 
if ( coveredf toAtom ] ) 

continue; 
covered[toAtom] = level + 1; 
added* +; 

} 

} 

} 

} 

return covered; 

} 

static double *computeVdwWeights(CtConnectionTable *ct, int atomldx, int terminalAtomldx, double 

reductionFactor, int **r_covered ) 

/* 

see findDirectionalNeighbors for description. Same thing, only modified for weights 

*/ 
{ 

CtAtom *A; 

CtAtomBondData *bond; 

CtBond *bptr; 

int *covered; 

int added; 

int level; 

int toAtom; 

int i, j; 
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Split *S; 
int *bsplit; 
int bondldx; 
double *v_weight; 

double *r_weight; /* reference weight, so anchor atoms are included in next aggregate */ 

v_weight = (double *) calloc(ct->atomCount, sizeof(double) ); 
for ( i = 0; i < ct- > atomCount; i+ + ) 

v_weight[i] = 1.0; 
if ( rcovered ) 

*r_covered = (int *) 0; 

if ( atomldx < 0 1 1 atomldx > = ct-> atomCount 1 1 reductionFactor == 1.0 ) 

return vweight; 
if ( terminal Atomldx > = ct- > atomCount ) 

return v weight; 
S = FindBreakPoints(ct, 2, 1, 0 ); 

if (IS || S->s2cnt == 0) 
{ 

if (S ) 

freeSplit(S); 
return v weight; 

} 

bsplit = S->bondMask; 

r_weight = (double *) calloc(ct- > atomCount, sizeof(double) ); 
for ( i = 0; i < ct- > atomCount; i++ ) 
r_weight[i] = 1.0; 

A = ct-> atoms + atomldx; /* index is zero based */ 

covered = (int *) calloc(ct- > atomCount, sizeof(int) ); 

covered[atomIdx] = 1; 

if ( terminalAtomldx > = 0 ) 

covered[terminalAtomIdx] = -1; /* -1 means do not cross this atom, it is the 
anchor/terminal atom */ 

added = 1; 

for ( level = 1; added && level < = ct-> atomCount; level + + ) 

{ 

for ( i = added = 0; i < ct- > atomCount; i++ ) 

{ 

if ( covered[i] = = level ) 

{ 

A = ct-> atoms + i; 

for ( j = 0, bond = A- > bond; j < A- > bondCount; j + + , bond+ + 

) 

{ 
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} 


toAtom - bond->toAtom; 
if ( covered[ toAtom ] ) 
continue; 

bondldx = bond->ptr - ct-> bonds; 
if ( bsplit[bondIdx] ) 

r_weight[toAtom] = r_weight[i] * reductionFactor; 

else 

r_weight[toAtom] = r_weight[i]; 
v_weight[toAtom] = r_weight[i]; 

covered [toAtom] = level + 1; 
added* +; 


} 


} 

} ; 

} 

free((char *) rweight ); 

freeSplit(S); 

if ( r_covered ) 

*r_covered = covered; 

else 

firee((char *) covered); 
for ( i = 0; i < ct- > atomCount; i+ + ) 
{ 

if ( v_weight[i] < 0.6 ) /* minimum atom weight */ 
v_weight[i] = 0.6; 

} 

return vweight; 


int TOP_HEV_COUNT(struct CtConnectionTable *ct) 
{ 

CtAtom *atomp; 
int i; 

int hevCount; 

for ( i = hevCount = 0, atomp = ct-> atoms; i < ct- > atomCount; i++, atomp++ ) 

{ 

if ( atomp- > class != CtAtomElement ) 
continue; 

if ( atomp- >id.atomicNumber != HYDROGEN ) 
hevCount + +; 

} 

return hevCount; 

} 

static int *createAtomMask(CtConnectionTable *ct, int termflag, int *r_hevCount) 

{ 
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int *atomMask; 
CtAtom atomp; 
int i; 

int hevCount; 

atomMask = (int *) calloc(ct->atomCount, sizeof(int) ); 

for ( i = hevCount = 0, atomp = ct-> atoms; i < ct->atomCount; i+ + , atomp + + ) 

{ 

if ( atomp- > class ! = CtAtomElement ) 
continue; 

if ( atomp- > id.atomicNumber = = HYDROGEN ) 
continue; 

hevCount + -I- ; /* count hev if terminal or not */ 
if ( Itermflag && atomp- >bondCount == 1 ) 

continue; /* don't count the terminal atoms */ 

atomMask[i] = 1; 

} 

*r_hevCount = hevCount; 
return (atomMask); 

} 

/* 

for a bond in a ct determine if by splitting this bond the two remaining pieces, 
contain at least N minimum number of heavy atoms (variable minHev). The terminal flag if 
set to true count's terminal atoms, otherwise when false terminal atoms are not 
counted even if they are heavy atoms. 

Two arrays are returned the size of ct-> atomCount, a three way indicator is set for 
each atom in each set. 

0: atom is not in set 
1: atom is in set: 

-1: atom is the anchor atom in the set. 

*/ 


static int validBreakPoint(CtConnectionTable *ct, intbondidx, int *atomMask, int minHev, inttermflag, 
int **rbl, int **rb2 ) 

{ 

CtBond *bondp; 

CtAtom *atomp; 

int*dl, *d2; 

int dlhevcnt, d2hevcnt; 

int termPassed; 

int i; 
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#ifdef DEBUGVALIDB 
int dlcnt, d2cnt; 

#endif 

bondp = ct-> bonds + bondidx; 
atomp = ct-> atoms + bondp- > atoms [0]; 

if ( atomp-> class != CtAtomElement 1 1 atomp- > id. atomicNumber == HYDROGEN ) 
return 0; 

atomp = ct-> atoms + bondp->atoms[l]; 

if (atomp- > class != CtAtomElement 1 1 atomp- > id. atomicNumber == HYDROGEN ) 
return 0; 

15 dl = findDirectionalNeighbors(ct, bondp- >atoms[0], bondp- >atoms[l], -1 ); 

d2 = findDirectionalNeighbors(ct, bondp- >atoms[l], bondp- >atoms[0], -1 ); 

#ifdef DEBUGVALIDB 
dlcnt = d2cnt = 0; 

20 fprintf(stdout,"atom set: %d %d\n", bondp- > atoms [0] + 1, bondp- >atoms[l] + 1 ); 

O for ( i = 0; i < ct- > atomCount; i+ + ) 

{ 

if (dl[i] > 0) 
fy fprintf(stdout, " % d " , i + 1 ); 

2§J } 
^ fprintf(stdout,"\n"); 
4» for ( i = 0; i < ct- > atomCount; i + + ) 

^ if ( d2[i] > 0 ) 

fprintf(stdout, " %d " , i+ 1 ); 

£ ) 

y fprintf(stdout,"\n M ); 
[U #endif 

35^ for ( i = dlhevcnt = d2hevcnt = 0; i < ct- > atomCount; i++ ) 

{ 

#ifdef DEBUG_VALID_B 

if (dl[i] > 0) 
40 dlcnt++; 

if ( d2[i] > 0 ) 

d2cnt++; 

#endif 

if ( atomMask[i] ) 
45 { 

if (dl[i] > 0) 

dlhevcnt+ + ; 
if ( d2[i] > 0) 
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d2hevcnt+ + ; 

} 

} 

#ifdef DEBUG_VALID_B 

fprintf(stdout,"%d of %d and %d of %d \n", 
dlhevcnt, dlcnt, d2hevcnt, d2cnt ); 

#endif 

if ( dlhevcnt < minHev 1 1 d2hevcnt < minHev ) 
{ 

*rbl = (int *) 0; 
*rb2 = (int *) 0; 
free(dl); 
free(d2); 
return 0; 

} 

*rbl = dl; 
*rb2 = d2; 
return 1; 

} 

static int BuildFrags(Split *S) 
{ 

int i, j; 
Frag *curr; 
int *atoms; 
int cnt; 

int atomCount; 
int *aptr; 

int atomsBaseldx = -1; 
int copyBaseldx; 
int *ordering; 
int natms; 
double *coo; 

struct CtConnectionTable *ct; 

if (!S || !S->ct) 
{ 

fprintf(stderr, "Build frags has no ct to copy from 
return -1; 

} 

if (S->fragsBuilt) 

return 0; 
S->fragsBuilt =1; 
ct = S->ct; 

atomCount = ct-> atomCount; 
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atoms = (int *) malloc( atomCount * sizeof(int) ); 

for ( i = 0, curr = S-> frags; i < S->numFrags; i+ + , curr+ + ) 

{ 

if(curr->ct) 

{ 

continue; /* already built */ 

} 

memset((char *) atoms, '\0\ sizeof(int) * atomCount ); 
atomsBaseldx = -1; 

for ( j = cnt = 0, aptr = curr- > atoms; j < atomCount; j+ + , aptr+ + ) 

{ 

if ( *aptr ) 
{ 

if ( *aptr == -1 ) 

atomsBaseldx = j; 
atoms[cnt] = j + 1; 
cnt+ + ; 

} 

} 

curr- > ct = DB_CT_UTL_COPY_CT(ct, cnt, atoms, &ordering, CtCopyKeepAllAttrs 

if ( !curr->ct ) 

continue; 
copyBaseldx = -1; 
for(j = 0; j < cnt;j + + ) 
{ 

if ( ordering!]] = = atomsBaseldx ) 
copyBaseldx = j; 

} 

curr->copyBaseAtom = copyBaseldx; 
if ( copyBaseldx = = -1 ) 
continue; 

curr->origMapping = (int *) malloc(sizeof(int) * cnt ); 

memcpy((char *) curr->origMapping, (char *) ordering, sizeof(int) * cnt ); 

DB_CT_UTL_FIND_RINGS(curr- > ct); 
UTL_ERROR_CLEAR0; 

DB_CT_GET_CT_ATTR( curr->ct, CtCt3DCoordSet, &coo, Anatms); 
curr- > cords = coo; 

topAlignCt(curr- > ct, curr- > copyBaseAtom, S- > featureMask, curr- > origMapping 
/* align compound occording to topomer rules -- all trans */ 
if ( qmode ) 

getQueryExtents(eurr- > cords, curr- > atomCnt); 

} 

if ( atoms ) 

free((char *) atoms ); 
return 0; 
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} 

static void getQueryExtents(double *coords, int atomCnt ) 

{ 

int i; 

double x,y,z; 

for ( i = 0; i < atomCnt; i+ + ) 
{ 

x = *coords; 
y = *(coords+l); 
z = *(coords+2); 
coords +=3; ] 
if ( x < qxmin ) 

qxmin = x; 
if ( x > qxmax ) 

qxmax = x; 

if ( y < qymin ) 

qymin = y; 
if ( y > qymax ) 

qymax = y; 

if ( z < qzmin ) 

qzmin = z; 
if ( z > qzmax ) 

qzmax = z; 

} 

} 

static int BuildTopomers(CtConnectionTable *ct, Split *S, Split *query) 
{ 

int i, j; 
Frag *curr; 
int cnt; 

int atomCount; 
int *aptr; 
int al; 
int genHex; 
double outside; 
static IRegionPtr r; 
double *cf; 
double *cf2; 
char *hexStr; 
int *fragMask; 
split2 *qs2; 
split3 *qs3; 
split2 *s2; 
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split3 s3; 
int topskip; 

if(!S || !ct) 

return -1; 
if ( ! stdRegion ) 

stdRegion = TOP_MAKE_STD_REGION(); 
UTL_ERROR_CLEAR0; 

if ( !q_matrixMode ) 

makeTopRegions(<LStepSize, S->numFrags); 

else 

{ 

regions[0] = stdRegion; 
maxregions = 1; 

} 


S->ct = ct; 
BuildFrags(S); 

genHex = 0; 
#ifdefUSE_HEX 

genHex =1; 

#endif 

if ( q_debugfp ) 

genHex =1; 
fragMask — (int *) 0; 
#ifndef NO_STRMAP 
if ( query ) 
{ 

fragMask = (int *) calloc(S->numFrags, sizeof(int) ); 

/* Find which fragments to actually build the topomer fields for, only those which the 

features 

don't disqualify this fragment combination 

*/ 

if ( query- >s2 && S->s2 && q_do2piece && query- >alloc2Map ) 
{ 

for ( i = 0, qs2 = query- >s2; 

i < query- > s2cnt && qs2->strMap; 
i+ + , qs2++ ) 

{ 

for (j = 0; j < S->s2cnt;j + + ) 
{ 

if ( qs2->strMap[j] ) 
{ 
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#endif 


s2 = S->s2 + j; 
fragMask[ s2->fragl ] = 1; 
fragMask[ s2->frag2 ] = 1; 


10 


15 


20 


rr*" 

yj 


3fc! 
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40 


qs3++ ) 


qs2++ ) 


} 

} 

if ( query- >s3 && S->s3 && q_do3piece && query- > alloc3Map ) 
{ 

for ( i = 0, qs3 = query- >s3; qs3->strMap && i < query- >s3cnt; i+ + , 


{ 


for(j = 0;j < S->s3cnt; j++ ) 
{ 

if (qs3->strMap[j] ) 
{ 

s3 = S->s3 + j; 
fragMask[ s3-> tragi ] 
fragMask[ s3- > frag2 ] 
fragMaskj s3->frag3 ] 
fragMask[ s3->firag4 ] 


} 


} 


} 


} 

if ( query- >s2 && S->s3 && q_doSubset && query- > allocSubsetMap ) 
{ 

for ( i = 0, qs2 = query- >s2; qs2- > subsetMap && i < query- >s2cnt; i+ + , 


{ 


for (j = 0; j < S->s3cnt; j + + ) 
{ 

if ( qs2- > subsetMap[j] ) 

{ 

s3 = S->s3 + j; 
fragMask[ s3->fragl ] 
fragMask[ s3->frag2 ] 
fragMask[ s3- > frag3 ] 
fragMask[ s3- > frag4 ] 


= 1 

= i; 

= l; 

= l; 


} 


} 


for ( i = topskip = 0, curr = S-> frags; i < S->numFrags; i+ + , curr+ + ) 

{ 
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if ( !curr->ct 1 1 curr->copyBaseAtom == -1 1 1 !curr-> cords ) 
continue; 

#ifO 

if ( q_coremode && Iqmode && i%2 ) 
continue; 

#endif 

#ifndef NO_STRMAP 

if ( fragMask && fragMask[i] = = 0 ) 

{ 

topskip+-f; 
continue; 

} 

#endif 

if ( q_debugfp ) 
{ 

writeCopy(q_debugfip, curr->ct, i, -1, (searchCnt > 0 ) ? "TS_SID" : 

TS_QID W ); 

if ( debug2 ) 

writeCopy(debug2, curr->ct, i, -1, (searchCnt > 0 ) ? "TS_SID" : 

TS_QID" ); 

} 

al = curr->copyBaseAtom; 
#ifdef DEBUGJDETAIL 

if ( q_debugfp ) 
{ 

^rintfCq^debug^p/^frag: %d base: %d atomCnt: %d\n M , 
i + 1, al + 1, ct->atomCount); 

} 

#endif 

curr-> AtWts = compute VdwWeights(ct, al, -1, q_ReductionFactor, (int **) 0 ); 

if ( curr- > id > = S->s2cnt ) 

minRegion = minRegion3P; 

else 

minRegion = minRegion2P; 
if ( Iqmode ) 

{ 

r = getRegionToUse(curr-> cords, curr- > atomCnt, &(curr- > regionldx), 

&(curr->npoints) ); 

curr- > outside = atomsOutside(curr- > cords, curr- > atomCnt, r, curr- > AtWts, 
&(curr->outsidePenalty) ); 

curr->topField = TOP_STER_ATOM_EV AL_ALL_RB_ATTEN(curr- > ct, r, 

al + 1, 

curr- > cords, curr- > AtWts ); 

#ifhdef NO_COMPRESSION 

cf = compressField(curr->topField, r->n_points ); 
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curr->topField = cf; 

#endif 

#ifdef STD_REGION 

curr- > stdField = TOP_STER_EVAL_ALL_RB_ATTEN(curr- > ct, stdRegion, 

al + 1, 

curr- > cords, curr->AtWts ); 

#endif 

} 


r = getRegionToUse(curr-> cords, curr- > atomCnt, &(curr->regionIdx), 

&(curr->npoints) ); 

if ( curr- > id > = S->s2cnt && curr->regionIdx > minRegion3P ) 

{ 

minRegion3P = curr->regionIdx; 

} 

if ( curr->regionIdx > minRegion2P ) 

minRegion2P = curr->regionIdx; 

else 

curr->regionIdx = minRegion2P; 
for ( j = 0; j < max regions; j + + ) 

{ 

r = regionslj]; 

#if 0 

curr->qtffj] = TOP STER EVAL ALL RB _ATTEN(curr- > ct, 

regions[j], al + 1, 

curr- > cords, curr->AtWts ); 
compareFields(curr->qtf[j], cf, r->n_points ); 
cf2 = compressField(cf, r->n_points ); 
free((char *) cf2 ); 

#endif 

curr->qtfD] = TOP_STER_ATOM_EVAL_ALL_RB_ATTEN(curr-> ct, 

regions[j], al + 1, 

curr- > cords, curr->AtWts ); 

#ifndef NO_COMPRESSION 

cf = compressField(curr->qtf[j], r->n_points ); 
curr->qtf[j] = cf; 

#endif 

} 

if ( !((i+l) % 10)) 

fprintf(stderr, "Built Query fragments: %d of %d\n" , i + 1 , S- > numFrags 

); 

#ifdef STDREGION 

curr- > stdField = TOP_STER_EVAL_ALL_RB_ATTEN(curr- > ct, stdRegion, 

al + 1, 

curr- > cords, curr->AtWts ); 
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lendif 

} 

if ( q_debugfp && !qmode && curr->topField ) 

{ 

/* curr->topHex */ 

cf = TOP_STER_EVAL_ALL_RB_ATTEN(curr- > ct, stdRegion, al + 1, 

curr-> cords, curr->AtWts ); 
hexStr = strdup(CT_FIELD2HEX(cf, stdRegion- >n_points)); 
fprintf(q_debugfp, "# %s\n", hexStr ); 
#ifdef NO_COMPRESSION 

free((char *) cf); /* don't free the field with compression enabled 

/ ; 

#endif 

free((char *) hexStr ); 

} 

} 

if ( fragMask ) 

free((char *) fragMask ); 

#ifO 

if ( topskip ) 

lprintf(stderr," skipped building %d of %d topomers\n M , topskip, S->numFrags ); 

#endif 

return 0; 

} 

static CtBond *getBond(struct CtConnectionTable *ct, int idl, int id2 ) 
{ 

int i; 

CtAtomBondData *abd; 
CtAtom *a; 

a = ct-> atoms + idl; 

for ( i = 0, abd = a->bond; i < a- > bondCount; i+ + , abd++ ) 

{ 

if ( abd- > to Atom = = id2 ) 
return abd- > ptr; 

} 

return (CtBond *) 0; 

} 

/* 

Align the ct fragment according to topomer alignment rules, 
adjust all torsions to a trans position for all single bonds with 
non-terminal atoms and do reflection if needed for ail prochiral atoms 
Rob Jilek: Nov. 2000 
*/ 
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static int topAlignCt(struct CtConnectionTable *ct, int baseAtom, int *featureMask, int *ctMapping ) 
{ 

int *atoms; 

int *atomDist; 

int singleBonds; 

int *toAtoms; 

int *secChoice; 

double *mol Weights; 

int i,j; 

int idx; 

int status; 

int distance; 

CtAtom *atomp; 

CtAtomBondData *bi; 

CtBond *bondp; 

CtBondTypeDef bondType; 

CtSimpleBondTypeDef simpleTypes; 

int bent; 

int priority [4]; 

struct cipSupportDef *support; 

int aO, al, a2, a3; 

int rbondsJoined; 

double *cord; 

double torsion; 

int dorefle; 

int mode; 

int hent, fent, clcnt, brent; 
int planeAtoms[3]; 

char *atomMessageQ = { "na", "important", "chiral" }; 
double *tors; 


if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &cord, &i)) 
return -1; 
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g_ct = ct; 


singleBonds = (int *) caljoc(sizeof(int), ct- > bondCount ); 
atoms = (int *) calloc(sizeof(int), ct- > atomCount ); 
40 tors = (double *) calloc(sizeof(double), ct- > atomCount ); 

for ( idx = 0, bondp = ct-> bonds; 

idx < ct-> bondCount; 
idx+ + , bondp++ ) 

45 { 

#define TOPALIGNDOUBLE 
#ifdef TOP_ALIGN_DOUBLE 

if ( bondp- > simpleBondType == CtSimpleBondTypeNotSimple ) 
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{ 

AsimpleTypes ); 
)) 


} 

else 
{ 


bondType = DB_CT_GET_BOND_TYPE(ct, STDID(idx), &bcnt, 
if ( !( bondType = = CtBondTypeSingle 1 1 bondType = = CtBondTypeDouble 
continue; 


if ( ! ( bondp->simp!eBondType == CtSimpleBondTypeS ingle |j 
bondp->simpleBondType == CtSimpleBondTypeDouble ) ) 

continue; /* must be single or double */ 

} 

#else 

if ( ! ( bondp->simpleBondType == CtSimpleBondTypeS ingle | | 

bondp->simpleBondType == CtSimpleBondTypeNotSimple ) ) 
continue; /* must be single, check NotSimple next. */ 

if ( bondp->simpleBondType == CtSimpleBondTypeNotSimple ) 
{ 

bondType = DB_CT_GET_BONDJTYPE(ct, STDID(idx), &bcnt, 

&simpleTypes ); 

if ( bondType != CtBondTypeSingle ) 
continue; 

} 


#endif 


#ifO 


if ( AB_IN_RING(bondp) ) 
continue; 

/* Jan, 16th 2000 - align the hydrogens and other terminal atoms 

/* if either atom attached to this bond is terminal, then ignore this bond 

atomp = ct-> atoms + bondp->atoms[0]; 
if ( atomp- >bondCount == 1 ) 
continue; 

atomp = ct-> atoms + bondp->atoms[l]; 
if ( atomp- >bondCount = = 1 ) 
continue; 


#endif 


/* We have a bond and the atoms we wish 


to adjust the torsions on */ 

singleBondsfidx] = 1; 

atoms[ bondp-> atoms [0] ] = 1; 

atomsf bondp->atoms[l] ] = 1; 

} 

/* now add in the prochiral atoms */ 
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support = DB_CT_CHIRAL_CIP_SETUPO; 

for ( i = 1; i < = ct-> atomCount; i+ + ) 
{ 

status = DB_CT_UTL_IS_CHIRAL_TYPE(ct, i, 1, 1, &hcnt, &fcnt, &clcnt, &brcnt ); 
if ( status = = 0 ) 
continue; 
if ( status == -1 ) 

{ 

UTL_ERROR_CLEAR0; 
continue; 

} 

status = DB_CT_CHIRAL_GET_RS_PRIORITY(ct, i, priority, support ); 
if ( status = = 0 ) 
continue; 

atoms[i-l] = 2; /* mark it differently that this is a prochiral atom */ 

} 

DB_CT_CHIRAL_CIP_FREE(support); 

atomDist = findDirectionalNeighbors(ct, baseAtom, -1, -1 ); 

molWeights = computePathWeights(ct, baseAtom, atomDist, featureMask, ctMapping ); 
toAtoms = findLargestBranch(ct, atomDist, molWeights ); 
gatomDist = atomDist; 

al = baseAtom; 
a2 = toAtoms[al]; 
a3 = toAtoms[a2]; 
if ( a3 == -1 ) 

TOP_ALIGN_MOL(cord, ct- > atomCount, al + 1 , a2 + 1 , a2 + 1); /* function want's base 

1 atom ids */ 
else 

TOP_ALIGN_MOL(cord, ct- > atomCount, al + 1 , a2 + 1 , a3 + 1); /* function want's base 

1 atom ids */ 


rbondsJoined = 0; 
bondp = getBond(ct, a2, a3); 
if ( bondp && AB JNRING(bondp) ) 
rbondsJoined+ + ; 

torsion = 180.0; 
if ( rbondsJoined = = 1 ) 
torsion = 90.0; 

/* where al is baseAtom, a2 is toAtoms[al], etc */ 
if (a3 != -1 ) 

setRootTorsion(cord, ct- > atomCount, al, a2, a3, torsion ); 
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#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 
{ 

fprintf(q_debugfp, "# root: fixed %d %d %d %6.01f\n", al, a2, a3, torsion ); 
5 for ( i = 0; i < ct->atomCount; i+ + ) 

{ 

fprintf(q_debugfp, "# toAtom %d-> %d (%d %d) \n", 

i, toAtoms[i], atomDist[i], ( toAtoms[i] > = 0 ) ? atomDist[ toAtoms[i] 

] : -1 ); 
10 } 

} 

#endif 

/* now adjust the torsion in atom distance order */ 

for ( distance = 2; distance < = ct->atomCount; distance* + ) 

{ 

for ( i = 0; i < ct- > atomCount; i+ + ) 
2 2L { 

if ( atoms[i] = = 0 j | i = = baseAtom ) 
£ continue; /* not interested in this atom */ 

2t if ( atomDist[i] ! = distance ) 

! ^ continue; /* we are not doing this distance from the base 

2?fj atom now */ 

+; if ( atoms[i] = = 2 && !getFromRingCount(ct, atomDist, i, toAtoms[i] ) ) /* 

2~ a chiral atom */ 

f { 

La /* we can NOT convert if either main chain bonds 

3Gp are in a ring */ 
E aO = baseAtom; 

S§ a2 = i; 

^ getFromChiral Atoms(ct, atomDist, molWeights, i, toAtoms[i] , &al , &a3 

E ); • 

35 #ifdef DEBUG_DETAIL 

if ( q_debugf]p ) 

fprintf(q_debugfp,"# reflect torsion atoms: %d %d %d %d\n", 
aO, al, a2, a3 ); 

#endif 

40 if (aO != -1 &&al != -1 && a2 != -1 && a3 != -1 && aO != al) 

{ 

torsion = UTL_GEOM_TAU( cord+(a0*3), cord + (a 1*3), 

cord+(a2*3), cord+(a3*3) ); 

UTL_ERROR_CLEAR0; 
45 if ( torsion < 0.0 ) 

torsion + = 360.0; 
mode = (atomDist[i] -1) % 2; 

#ifdef DEBUG_DETAIL 
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%6.01f mode:%d\n", 
lendif 

#ifdef ALTERNATECHIRAL 

< 180.0) 

lendif 


#ifdef DEBUG DETAIL 


planeAtoms[2] ); 
#endif 

#ifdef ALTERNATECHIRAL 
#endif 


if ( q_debugfp ) 

fprintf(q_debugfp,"# reflect torsion: %d %d %d %d 

aO, al, a2, a3, torsion, mode ); 


if ( mode = = 1 && torsion > 180.0 1 1 mode = = 0 && torsion 
{ 

planeAtoms[0] = al; 

p!aneAtoms[l] = i; 

planeAtoms[2] = toAtoms[i]; 

reflectAtoms(cord, ct- > atomCount, 3, planeAtoms ); 

tors[i] = torsion * 100.0; 

if ( q_debugfp ) 

fprintf((Ldebugfp, M # reflected: %d %d %d\n", 
planeAtoms[0], planeAtoms[l], 


} 

} 

al = i; 

atomp = ct-> atoms + i; 

for (j = 0, bi = atomp- > bond; j < atomp- >bondCount; j+ + , bi+ + ) 

{ 

if ( atomDistf bi->toAtom ] != (distanced 1) ) 

continue; 
idx = bi->ptr - ct-> bonds; 


#ifdef DEBUG DETAIL 


#endif 


if ( q_debugfp ) 

fprintf(q_debugQ>, M # atominfo %d %d (%d %d) = %d\n H , 
al + 1, bi->toAtom + 1, 
bi->ptr->refldx, idx, 
singleBonds[idx] ); 

if ( singleBonds[ idx ] = = 0 ) /* make sure rotatable bond */ 

continue; 
a2 = bi->toAtom; 

aO = getFromAtom(ct, atomDist, molWeights, i, a2, baseAtom, cord ); 

/* a2 = toAtoms[i]; */ 
a3 = toAtoms[a2]; 

if (aO == -1 1 1 al == -1 | i a2 == -1 j | a3 == -1 ) 
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{ 

if ( qjiebugfp ) 

fprintf(q_debugfp, "# not aligned one or more of the atom 

ids is -1: %d %d %d %d\n w , aO, al, a2, a3 ); 

continue; 

} 


/* count the number of ring bonds joined */ 
rbondsJoined = 0; 
bpndp = getBond(ct, aO, al); 
if ( bondp && AB_IN_RING(bondp) ) 

rbondsJoined + + ; 


bondp = getBond(ct, a2, a3); 
if ( bondp && AB_IN_RING(bondp) ) 
rbondsJoined + +; 


torsion = 180.0; 

if ( rbondsJoined = = 1 ) 

torsion = 90.0; 
else if ( rbondsJoined = = 2 ) 

torsion = 60.0; 
setTorsion(cord, ct->atomCount, aO, al, a2, a3, torsion ); 
tors[al] = torsion; 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

fprintf(<Ldebugfp,"# torsion: %d %d %d %d %6.01f\n", aO, al, 

a2, a3, torsion ); 
#endif 

} 

} 

} 

#ifdef DEBUG_DETAIL 
if ( q^debugfp ) 

{ 

for ( i = 0; i < ct- > atomCount; i++ ) 

{ 

fprintf(q_debugfp,"# %2d: %2d %2d %8.21f %7.21f %s \n", 
i+1, atomDist[i], toAtoms[i], molWeights[i], tors[i], 
atomMessage[ atoms[i] ]); 

} 

} 

#endif 

free((char *) atomDist); 
free((char *) molWeights); 
free((char *) toAtoms ); 
free((char *) singleBonds); 
free((char *) atoms ); 
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free((char *) tors ); 


return 0; 


static int getFromAtom(struct CtConnectionTable *ct, int *atomdist, double *molWeights, int atom, int 
toAtom, int baseAtom, double *cord ) 

{ 

int i; 

int bestb[4]; 
int nlowest; 
int nbest; 
double bestw; 
CtAtom *A; 
CtAtom *aptr; 
CtAtomBondData *abd; 
double tors[4]; 
double tlow; 


A = ct-> atoms + atom; 

if ( atomdist[atom] = = 1 ) 
return -1; 


/* otherwise it isn't the base atom */ 
bestw = -1.0; 

bestb[0] = bestbfl] = bestb[2] = bestb[3] = -1; 
nbest = 0; 

for ( i = nbest = 0, abd = A->bond; i < A->bondCount; i+ + , abd++ ) 
{ 

if ( atomdist[ abd-> toAtom ] = = ( atomdist[ atom ] - 1) ) 

{ 

if ( molWeights[ abd->toAtom ] > bestw ) 

{ 

nbest = 0; 

bestw = molWeights[ abd- > to Atom ]; 
bestbfnbest] = abd->toAtom; 
nbest+ + ; 

} 

else if ( molWeights[ abd- > to Atom ] == bestw && nbest < 4 ) 

{ 

bestb[nbest] = abd->toAtom; 
nbest+ + ; 

} 
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} 

} 

if ( nbest > 1 ) 
{ 

/* must break the tie */ 
for ( i = nlowest = 0, tlow = 400.0; i < nbest; i++ ) 

{ 

tors[i] = UTL_GEOM_TAU(cord+(baseAtom*3), cord + (atom *3), 
cord+(toAtom*3), cord+(bestb[i]*3) ); 

while (tors[i] < 0.0 ) 

tors[i] + = 360.0; 
while (tors[i] > 360.0 ) 

tors[i] -= 360.0; 
UTL_ERROR_CLEAR0; 

#if 0 

if ( tors[i] < 90.0 ) 

return bestb[i]; 

#endif 

if ( tors[i] < tlow ) 
{ 

nlowest = i; 
tlow = tors[i]; 

} 

} 

return bestb[nlowest]; 

} 

else if ( nbest = = 1 ) 
return bestb[0]; 
return -1; 

} 

static int getFromRingCount(struct CtConnectionTable *ct, int *atomdist, int atom, int toAtom ) 

{ 

int i; 
int rent; 
CtAtom *A; 
CtAtom *aptr; 
CtAtomBondData *abd; 

A = ct-> atoms + atom; 

if ( atomdist[atom] = = 1 ) 
return 0; 

/* otherwise it isn't the base atom */ 

for ( i = rent = 0, abd = A->bond; i < A->bondCount; i+ + , abd+-f ) 

{ 
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if ( atomdist[ abd- > toAtom ] = = ( atomdist[ atom ] - 1) ) 

{ 

if ( AB_IN_RING(abd- > ptr) ) 
rcnt+ + ; 

5 } 

else if ( abd->toAtom == toAtom && AB_IN_RING(abd- > ptr) ) 
rcnt+ + ; 

} 

#ifdef DEBUGDETAIL 
10 if ( q_debugfp ) 

fprintf(q_debugfp,"# atom:%d rcnt:%d\n", atom, rent ); 

#endif 

return rent; 

} 

15 

static int getFromChiralAtoms(struct CtConnectionTable *ct, int *atomdist, double *molw, int atom, int 
toAtom, 

int *r_fromAtom, int *r_toatom) 

U inti; f 

m3 int ids[2]; 

S3 int weight[2]; 

\ l i int idx = 0; 

2i] int rent; 

■jz CtAtom *A; 

:|7 CtAtom *aptr; 

to CtAtomBondData *abd; 

:L int t_toAtom, tjength; 

3^ double theWeight; 

A = ct-> atoms + atom; 
*r_fromAtom = *r_toatom = -1; 

3^ #ifdef DEBUG_DETAIL 
if ( q_debugfp ) 

fprintf(q_debugfp, "# chiral atom: %d bondcount: %d toAtom: %d \n", 
atom, A->bondCount, toAtom ); 

#endif 

40 for ( i = rent = idx = 0, abd = A- > bond; i < A->bondCount; i+ + , abd + + ) 

{ 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

fprintf(q_debugfp, "# atom: %d dist:%d toatom:%d dist:%d \n", 
45 atom, atomdist[atom], abd- > toAtom, atomdist[ abd- > toAtom ] ); 

#endif 

if ( abd- > toAtom == toAtom 1 1 idx > = 2 ) 
continue; 
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if ( atomdist[ abd->toAtom ] < = ( atomdist[ atom ] - 1) ) 
{ 

*r_fromAtom = abd->toAtom; 
continue; 

5 } 

ids[idx] = abd->toAtom; 
ttoAtom = tlength = -1; 
theWeight = -1.0; 

traverseBranch(ct, abd->toAtom, atomdist, molw, abd->toAtom, &t_toAtom, &t_length, 

10 AtheWeight ); 

weight[idx] = theWeight; 
idx++; 

} 

if(idx==2) 
15 { 

if ( weight[0] > = weight[l] ) 
*r_toatom = ids[0]; 

else 

*r_toatom = ids[l]; 

20 } 

else if ( idx = = 1 ) 

*r_toatom = ids[0]; 

| } 

2i] static int getToAtoms( struct CtConnectionTable *ct, int *atomDist, double *mol\Veights, int idx, int 
+; *ratoml, int *ratom2 ) 

{ 

int i; 

int targetDistance; 
CtAtomBondData *abd; 
% CtAtom *A; 

^ double bestw; 

int besta; 

35^ A = ct-> atoms + idx; 

targetDistance = atomDist[idx] + 1; 
bestw = -1.0; 
besta = -1; 

40 *ratoml = *ratom2 = -1; 

for ( i = 0, abd = A->bond; i < A->bondCount; i++, abd-f-4- ) 
{ 

if ( atomDist[ abd->toAtom ] = = targetDistance ) 
45 { 

if ( molWeights[abd->toAtom ] > bestw ) 

{ 

bestw = molWeights[abd->toAtom]; 
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besta = abd->toAtom; 

} 

} 

} 

5 if ( besta = = -1 ) 

return -1; ] 
*ratoml = besta; 

A = ct-> atoms + besta; 
10 targetDistance = atomDist[besta] + 1; 

bestw = -1.0; 
besta = -1; 

for ( i = 0, abd = A->bond; i < A- > bondCount; i+ + , abd+ + ) 
15 { 

if ( atomDist[ abd->toAtom ] = = targetDistance ) 

{ 

if ( molWeights[abd->toAtom ] > bestw ) 

{ 

20^ bestw = moIWeights[abd->toAtom]; 

£3 besta = abd->toAtom; 

f } 
r « } 

2SM if (besta == -1 ) 

return -1 ; 

£ *ratom2 = besta; 

yj 

3(5 } 


return 0; 


jj: static double *computePathWeights(stmct CtConnectionTable *ct, int baseAtom, int *atomDist, int 
L;? *featureMask, int *ctMap ) 

35 int i,j,k; 

CtAtom *A; 

CtAtom *aptr; 

CtAtomBondData *abd; 

double * weights; 
40 int distance; 

int nextDistance; 

CtAtomBondData *found; 

double aweight; 

double *raw_weights; 
45 int toAtom; 

double adjval; 

static double maxadj = -1.0; 
static double feature align = 1.0; 
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FeatureType qfeature, strFeature; 

weights = (double *) calloc(sizeof(double), ct- > atomCount ); 
raw_weights = (double *) calloc(sizeof(double), ct-> atomCount ); 

5 

for ( i = 0, aptr = ct-> atoms; i < ct-> atomCount; i++, aptr+ + ) 

{ 

aweight = 0.0; 

DB_CT_GET_ATOMP_ATOMIC_WEIGHT(aptr, &aweight); 
10 raw_weights[i] = aweight; 

} 

if ( maxadj == -1.0 ) 

{ 

char *tptr; 

15 tptr = getenv("DBTOP_FEATURE_ALIGN_MAXADJ"); 

if(tptr) 
{ 

maxadj = atof(tptr); 

lprintf(stderr," Maximum feature adjustment for alignment: %8.21f. Set from 
20 environment variable: DBTOP_FEATURE_ALIGN_MAXADJ\n" , maxadj ); 

O . } 

£p else 

03 maxadj = 50.0; 

25j IP* = getenv(" DBTOP_FE ATURE_ALIGN_SC ALE" ) ; 

f if ( tptr ) 

* { 

yj featurealign = atof(tptr); 

^ fprintf(stderr," Feature alignment scaling factor: %8.21f . Set from environment 

3fe variable: DBTOP_FEATURE_ALIGN_SCALE\n% feature_align ); 

else 

If feature_align = 0.5; 

3^"* if ( maxadj < 0.0 ) 

maxadj = 0.0; 


40 } 


if ( featurealign < 0.0 ) 

featurealign = 0.0; 


if ( qJeatureFactor > 0.0 && maxadj > 0.0 && feature_align > 0.0 ) 
{ 

for ( i = 0; i < ct-> atomCount; i + + ) 

if ( featureMask[ ctMapfi] ] = = FeatureNone ) 

continue; /* no single atom feature at this atom */ 

for ( k = 0, adjval = 0.0, strFeature = featureMask[ ctMap[i] ]; k < 4; k+ + 
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{ 

if ( strFeature & fMasks[k] ) 
: adjval += q_featureFactor * featureWeights[k+l] * 

5 featurealign; 

} 

if ( adjval > maxadj ) 

adjval = maxadj; 
raw_weights[i] += adjval; 

10 } 

} 

for ( i = 0, A = ct->atoms; i < ct- > atomCount; i+ + , A++ ) 
{ 

15 if ( i = = baseAtom ) 

continue; 
aptr = A; 

distance = atomDist[i]; 
nextDistance = distance - 1; 
20 to Atom = i; 

D while ( distance ) ! 

5 { 1 

03 weights[i] + = raw_weights[toAtom]; 

250 for ( found = (CtAtomBondData *) 0, j = 0, abd = aptr- > bond; Ifound && 

= E J < aptr->bondCount; j + + , abd++ ) 

£ ■ { ' 

^ if ( atomDist[ abd->toAtom ] = = nextDistance ) 

s found = abd; 

3(M } 
=H if ( found ) 

K { 

lr aptr = ct-> atoms + found- > toAtom; 

J^f toAtom = found- > toAtom; 

35^ nextDistance-; 

distance-; 

} 1 

else : 

distance = 0; 

40 } 
} 

firee((char *) rawweights.); 
return weights; \ 

} 
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static int traverseBranch( struct CtConnectionTable *ct, int atomld, int *atomdist, double *molweight, 

int rootToAtom, int *r_toatom, int *r_length, double *r_weight ) 

{ 
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CtAtom *a; 

CtAtomBondData *abd; 
intj; 

5 a = ct-> atoms + atomld; 

if ( atomdist[ atomld J > *r_length 1 1 

( atomdist[ atomld ] = = *r_length && molweight[atomId] > *r_weight ) ) 

{ 

*r_toatom = rootToAtom; 
10 *r_length = atomdist[atomId]; 

*r_weight = molweight[atomId]; 

} 

for (j = 0, abd = a->bond; j < a- > bondCount; j + + , abd++ ) 
{ 

15 if ( atomdist[ abd-> toAtom ] == ( atomdist[atomId] + 1 ) ) 

{ 

#ifdef DEBUGDETAIL 

if ( debug2 ) 

fprintf(debug2,"#-> %d to %d dist:%d %d root:%d\n", 
20 atomld, abd->toAtom, atomdist[abd- > toAtom] , 

O atomdist[atomId], rootToAtom ); 
5 #endif 

traverseBranch(ct, abd->toAtom, atomdist, molweight, rootToAtom, r toatom, 

iM rjength, r_weight ); 

m } 
t } 

L } i 

3(N /* 

2: return an array containing the toAtom for each atom which points to the 

y largest chain bases on size and then weight. 

fi */ 

35^ static int *findLargestBranch(struct CtConnectionTable *ct, int *atomdist, double *weights ) 

{ i 

int *bi; 
int i,j; 
int toAtom; 
40 int length; 

double theWeight; 
CtAtomBondData *abd; 
CtAtom *atom; 


45 


bi = (int *) calloc(sizeof(int), ct- > atomCount ); 
for ( i = 0; i < ct-> atomCount; i+ + ) 
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{ 

atom = ct-> atoms + i; 
to Atom = length = -1; 
theWeight = -1.0; 

5 for ( j = 0, abd = atom- > bond; j < atom->bondCount; j + + , abd++ ) 

{ 

if ( atomdist[ abd->toAtom ] = = ( atomdist[i] + 1 ) ) 

{ 

#ifdef DEBUG_DETAIL 
10 if(debug2) 

j fprintf(debug2,"# %d to %d dist:%d %d\n", 

• i, abd- > to Atom, atomdist[abd- > toAtom] , atomdist[i] ); 

#endif 

traverseBranch(ct, abd->toAtom, atomdist, weights, abd->toAtom, 
15 AtoAtom, Alength, AtheWeight ); 

} 

} 

bi[i] = toAtom; 

} 

20 return bi; 

i 1 i 

^ static double CompareTwoCompounds(Split *query, Split *str, double radius, int *r_qidx, int *r_sidx, 

2Si int *r_splitidx, int *r_three, int *r_subsethit, double *r_best2, double *r_best3, double *r_bestsub, 

+; double *r_att_pen, int bailedout ) 

m * 

w double best; ; 

!L double best2, best3, bestsiib; 

3^ double dl, d2, d3, d4, d5, d6; 

£ double dval[6]; 

Si double cdval[6]; 

^ double attPen[2]; 

j*f int hevCnts[6]; 

3f" intbestQ, bestStr; 

int bestldx; 

int threelsBetter = 0; 

int SublsBetter = 0; 

int idl, id2, id3, id4; ! 
40 int i,j,k, 1; 

int ids[3]; 

Frag *f, *sf; 

Frag *ql, *q2, *q3, *q4; 
Frag *fsl, *fs2, *fs3, *fs4; 
45 Frag *fragPtrs[3]; 

Frag *qActive; 
split2 *qs2, *ss2; 
split3 *qs3, *ss3; 
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• 


10 


15 


double dptr; 
double hexdiff; 
double fieldDiff; 
double outPen; 
double bailout; 
double *cf[6]; 
int max3; 
static Split *qlnit; 


*r att j)en = 0.0; 
*r_qidx = bestQ = -1; 

if ( query- >numFrags = = 0 | | str- > numFrags = = 0 ) 
return 9999.0; 


bailout = radius*radius; 


regid = (char *) 0; 

DB_CT_GET_CT_ATTR(str- > ct,CtCtRegId, &regid ); 
20 if(!regid) 

^ DB_CT_GET_CT_ATTR(str- > ct,CtCtName, &regid ); 

J #ifdefUSEHEX 

!S if ( qlnit != query ) 

Q for ( i = 0, f = query- > frags; i < query- > numFrags; i++, f++ ) 

S { 

~ if (f->topHex ) 

L f->toplnt = hexStringToInts(f->topHex, &(f->topIntSize) ); 

3S } 
~E qlnit = query; 

w > 

q #endif 

35 for ( i = 0, f = query- > frags; i < query- > numFrags; i++, f++ ) 

{ 

if (f->hexDiff ) 

free((char : *) f- > hexDiff ); 

#ifdef STDREGION 
40 if (f->stdDiff ) 

free((char *) f- > stdDiff ); 

#endif 

f->hexDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
#ifdef STDREGION 

45 f- > stdDiff = (double *) calloc(str->numFrags,sizeof(double) ); 

#endif 

for(j = 0;j < str- > numFrags; j+ + ) 
{ 
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f->hexDiff|j] = -1.0; 

#ifdef STDREGION 

f->stdDifffj] = -1.0; 

#endif 

} 

} 

#ifdef USE_HEX 

for ( i = 0, f = str->frags; i < str->numFrags; i+ + , f++ ) 
{ 

if (f->topHex ) 

f->topInt = hexStringToInts(f->topHex, &(f->topIntSize) ); 

} 

lendif 

#ifdef CALC_BATCH_DIFF 

for (i = 0, f = query- > frags; i < query- >nuraFrags; i+ + , f++ ) 
{ 

for ( j = 0, sf = str-> frags; j < str->numFrags; j + + , sf+ + ) 

{ • 

#ifdef USE_HEX 

f->hexDiff[j] = fieldlntDiff (f->topInt, sf->topInt, f->topIntSize, 

sf->topIntSize ); 
#else 

f->hexDiff[j] = topFieldDiff(f-> topField, sf- > topField, str->npoints ); 

lendif 

if (f->featureDiff ) 

f- > featureDifffj] = compareFeatures(query, f, str, sf, -1, -1 ); 

#ifO 

fieldDiff = topFieldDiff(f-> topField, sf-> topField, str->npoints ); 
fprintf(stderr,"hex vs raw: hex:%7.41f field:%7.41f diff:%7.41f \n", 
f->hexDiffU], fieldDiff, fieldDiff - f- > hexDiff|j] ); 

#endif 
#if0 

hexdiff = fieldHexDiff(f->topHex, sf->topHex, 0 ); 
hexdiff *= hexdiff; 

if ( fabs( hexdiff - f->hexDiff[j] ) > 0.0001 ) 

fprintf(stderr, "field diff: %8.61f %8.61f %8.51f\n", 
hexdiff, f->hexDiffU], 
hexdiff- f->hexDifflj] ); 

#endif 

} 

} 

#endif 
#if 0 

fprintf(stden,"s2 cnts:%d %d\n", query- > s2cnt, str->s2cnt ); 
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fflush(stderr); 

#endif 

best = best2 = best3 = bestsub = 9999.0 * 9999.0; 

/* 

2 piece steric field comparison 
*/ 

if ( query- > s2 && str- > s2 && q_do2piece ) 
{ 

for ( i = 0, qs2 = query- >s2; i < query- >s2cnt; i++, qs2+ + ) 
{ 

if (qs2->fragl == -1 | | qs2->frag2 == -1) 
continue; 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
if ( qjjartialMatch ) 

{ i 

ql->featureDiff = ql->feature2PDiff; 
q2->featureDiff = q2- > feature2PDiff; 

} 

for (j = 0, ss2 = str->s2; j < str->s2cnt; j++, ss2++ ) 
{ 

if ( ss2->fragl == -1 || ss2->frag2 == -1) 
continue; 

#ifndef NOSTRMAP 

if ( qs2->strMap && qs2->strMap[j] = = 0 ) 

continue; /* feature throws this one out 

#endif 

idl = (str- > frags + ss2->fragl)->id; 
id2 = (str- > frags + ss2->frag2)->id; 

fsl = str-!> frags + ss2->fragl; 
fs2 = str-> frags + ss2->frag2; 
t_2compare+ + ; 
#ifO | 

fprintf(stderr,"ids %d: %d %d\n", j, idl, id2 ); 
fflush(stderr); 

#endif 

outPen = fsl->outsidePenalty -I- fs2->outsidePenalty; 
if ( outPen ) 

{ 

if ( outPen > bailout ) 

r 

continue; 
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} 

} 

#ifdef NO_COMPRESSION 

ql->hexDiff[idl] = topFieldDiff(ql->qtf[fsl->regionIdx], fsl->topField, 

fsl->npoints ); 

ql->hexDiff[id2] = topFieldDiff(ql->qtf[fs2->regionIdx], fs2- > topField, 

fs2->npoints ); 

q2->hexDiff[idl] = topFieldDiff(q2->qtf[fsl->regionIdx], fsl-> topField, 

fsl->npoints ); 

q2->hexDiff[id2] = topFieldDiff(q2->qtf[fs2->regionIdx], fs2-> topField, 

fs2->npoints ); 
#else 

if ( q_featureFactor > 0.0 && ql->featureDiff && q2- > featureDiff ) 
{ 

ql-> hexDiff[idl]=topFieldCompressedDiff(ql- > qtf[fsl- > regionldx], 
fsl-> topField, fsl->npoints, ql->featureDiff[idl] ); 

ql- > hexDiff[id2]=topFieldCompressedDiff(ql- > qtf[fs2- > regionldx], 
fs2-> topField, fc2-> npoints, ql->featureDiff[id2] ); 

q2- > hexDiff[id 1 ] =topFieldCompressedDiff(q2- > qtf[fs 1 - > regionldx] , 
fsl-> topField, fsl->npoints, q2->featureDiff[idl] ); 

q2- > hexDiff[id2] =topFieldCompressedDiff(q2- > qtf[fs2- > regionldx] , 
fs2-> topField, fs2-> npoints, q2->featureDiff[id2] ); 

} 

else 

{ 

ql - > hexDiff[id l]=topFieldCompressedDiff(q 1- > qtf[fs 1- > regionldx] , 
fsl-> topField, fsl->npoints, 0.Q ); 

qi-> hexDiff[id2]=topFieldCompressedDiff(ql- > qtf[fs2- > regionldx], 
fs2-> topField, fs2->npoints, 0.0 ); 

q2->hexDiff[idl]=topFieldCompressedDiff(q2->qtf[fsl->regionIdx], 
fsl-> topField, fsl->npoints, 0.0 ); 

q2- > hexDiff[id2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldx] , 
fs2-> topField, fs2->npoints, 0.0 ); 

} 

#endif 

#ifdef NO_COMPRESSION 
#ifdef COMPRESSCOMPARE 

cf[0] = cbmpressField(ql->qtf[fsl-> regionldx], fsl-> npoints ); 

cf[4] = compressField(fsl-> topField, fsl->npoints ); 

cdval[0] = topFieldCompressedDiff( cf[0], cf[4], fsl->npoints ); 

fprintf(stderr, "Compressed varies by %7.21f %6.21f %6.21f \n", 

fabs(ql->hexDiff[idl] - cdval[0]), ql->hexDiff[idl], cdval[0] ); 

cf[l] = cdmpressField(ql->qtfIfs2-> regionldx], fs2- > npoints ); 

cf[5] = compressField(fs2-> topField, fs2-> npoints ); 

cdval[l] = topFieldCompressedDiff( cf[l], cf[5], fs2->npoints ); 

I 
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free((char*) cft4] ); >atflfsl . >r egionIdx], fsl->npoints ); 
cf[41 = compressField(fsl >"P"« » ^ fel . >n points ); 

free((char *) d[S\)} >atflfs2 ->regionIdx], fs2->npoints ); 
^ = coii^w^^^i, fs i>n P oints ); 


ftee((char ;*) cf[01 ); 
free((char**) cflU ); 
free((char *) cf[2] ); 
free((char *) cfl3] ); 
free((char *) cfl5] ); 
free((char *) cf[4] ); 


#endif 
#endif 


#ifdef STD_REGION tooField Dlff(ql-> stdField, ftl-> .tdFleld. 

q l->stdDifflidl] = topFieWDinw ^ 

stdRegion->n_points); qi >stdD . ffl . d2] = topFie ldDiff(ql-> stdField, fs2->std 

stdRegion->nj,oints); q2>stdp . ff[ . dll = topF ieldDiff(q2-> stdField, fsl-> stdField, 

s tdRegion->nj)oints); q2 _ >stdt)iff[ . d2] = topFie ldDiff(q2-> stdField, fs2-> std^eld, 
stdRegion- > n_points ); 

(idx: %d %d) V, i+U+1, ^m^n 

q i->hexbiffiidii-qi-> stdDlffl,dlL 
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a l->hexDimid2]-ql-> s ^2MU 

l>hexDimidn-^ >st ^SS' 
q ; u r»ifffid21 - a2- > stdDifflid2], 


] 

#endif 


10 


if (cLfeatureFactor > 0.0) ql _ >featur eDiffUdll 

+ q2 .>fea to — 1 + outPen;__ ^ + ^^idl] + q— 

+ q 2->featuteDifflidll + ° utPeh ' 


} 

else 

15 { 


35" 


40 


20 = if ( dl < best ) 

5 bestQ = i; 

^ bestStr = j; 

Ty best = best2 = dl; 

2® bestldx = 0; 

| ! ; 

yJ if ( d2 < best ) 

'U { 

301 bestQ = »; 


bestStt = j; 
best = best2 = d2; 
bestldx = 1; 


} 

} 

} 

fflii-thrstderrt; 


lyimnV*- ' 

fflush(stderr); 
#endif 


45 /* 

3 piece steric field comparison 
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*/ 

for ( i = 0, qs3 = query- >s3; q_do3piece && qs3 && i < query- >s3cnt; i+ +, qs3+ + ) 

{ 

if (qs3->fragl == -1 1 1 qs3->frag2 == -1 1 1 qs3->frag3 == -1 ) 
continue; 

ql = query- > frags + qs3->fragl; 
q2 = query- > frags + qs3->frag2; 
q3 = query- > frags + qs3->frag3; 
q4 = query- > frags "+ qs3->frag4; 
if ( q_partialMatch ) 

{ 

ql->featureDiff = ql->feature3PDiff; 
q2->featureDiff = q2- > feature3PDiff; 
q3->featiireDiff = q3->feature3PDiff; 
q4->featureDiff = q4->feature3PDiff; 

} 

for (j = 0, ss3 = str->s3; ss3 && j < str- > s3cnt; j + + , ss3++ ) 

{ 

if (ss3->fragl == -1 1 1 ss3->frag2 == -1 1 1 ss3->frag3 == -1 ) 
continue; 

#ifndef NO_STRMAP 

if ( qs3-> strMap && qs3-> strMap[j] = = 0 ) 

continue; /* can't hit this 3 piece combination because 

features throws it out */ 
#endif 

fsl = str- > frags + ss3-> tragi; 
fs2 = str- > frags + ss3->frag2; 
fs3 = str- > frags + ss3->frag3; 
fs4 = str- > frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3- > id; 
id4 = fs4->id; 

t_3compare+ +; 

#ifdef NO_COMPRESSION 
#ifdef USE_HEX 

if (ql->hexDiff[idl] == -1.0) 

ql->hexDiff[idl] = fieldIntDiff(ql-> toplnt, fsl->topInt, 
ql->topIntSize, fsl->topIntSize); 

if (ql->hexDiff[id4] == -1.0) 

ql->hexDiff[id4] = fieldIntDiff(ql-> toplnt, fs4->topInt, 
ql->topIntSize, fs4->topIntSize); 

if ( q4->hexDiff[idl] = = -1.0 ) 
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q4->hexDiff[idl] = 
q4->topIntSize, fsl->topIntSize); 

if (q4->hexDiff[id4] == -1.0) 
q4->hexDiff[id4] = 
q4->topIntSize, fs4->topIntSize); 

if (q2->hexDiff[id2] == -1.0) 
q2->hexDiff[id2] = 
q2->topIntSize, fs2->topIntSize); 

if ( q2- > hexDiff[id3] = = -1 .0 ) 
q2->hexDiff[id3] = 
q2- > topIntSize, fs3- > topIntSize); 

if (q3->hexDiff[id3] == -1 ) 
q3->hexDiff[id3] = 
q3-> topIntSize, fs3-> topIntSize); 

if (q3->hexDiff[id2] == -1 ) 
q3->hexDiff[id2] = 
q3-> topIntSize, fs2-> topIntSize); 
#else 

if (ql->hexDiff[idl] == -1.0) 
ql->hexDiff[idl] = 

fsl->topField, fsl->npoints ); 

if (ql->hexDiff[id4] == -1.0) 
ql->hexDiff[id4] = 

fs4->topField, fs4->npoints ); 

if (q4->hexDiff[idl] == -1.0) 
q4->hexDiff[idl] = 

fsl->topField, fsl->npoints ); 

if ( q4->hexDiff[id4] = = -1.0 ) 
q4->hexDiff[id4] = 

fs4- > topField, fs4- > npoints ); 

if ( q2->hexDiff[id2] == -1.0 ) 
q2->hexDiff[id2] = 

fs2- > topField, ft2- > npoints ); 

if ( q2->hexDiff[id3] = = -1.0 ) 
q2->hexDiff[id3] = 

fs3- > topField, fs3- > npoints ); 

if (q3->hexDiff[id3] == -1 ) 


fieldIntDiff(q4- > toplnt, fs 1 - > toplnt, 


fieldIntDiff(q4- > toplnt, fs4- > toplnt, 


fieldIntDiff(q2- > toplnt, fs2- > toplnt, 


fieldIntDiff(q2- > toplnt, fs3- > toplnt, 


fieldIntDiff(q3- > toplnt, fs3- > toplnt, 


field!ntDiff(q3- > toplnt, fs2- > toplnt, 


topFieldDiff(q 1 - > qtf[fs 1 - > regionldx] 


topFieldDiff(ql- > qtf[fs4- > regionldx], 


topFieldDiff(q4- > qtf[fs 1 - > regionldx] , 


topFieldDiff(q4- > qtf[fs4- > regionldx] , 


topFieldDiff(q2- > qtf[fs2- > regionldx] , 


topFieldDiff(q2- > qtf[fs3- > regionldx], 


q3->hexDiff[id3] = topFieldDiff(q3->qtf[fs3->regionIdx], 

fs3- > topField, fs3- > npoints ); 

if (q3->hexDiff[id2] == -1 ) 

q3->hexDiff[id2] = topFieldDiff(q3->qtf[fs2-> regionldx], 

fs2-> topField, fs2-> npoints ); 

outPen = ( (fsl->outsidePenalty + fs2->outsidePenalty) / 2.0 ) + 
fs2->outsidePenalty + fs3->outsidePenalty; 
#endif 
#endif 

#ifhdef NO_COMPRESSION 

if (ql->hexDiff[idl] = = -1.0) 

ql- > hexDiff[idl] = topFieldCompressedDiff(ql- > qtf[fs 1- > regionldx] 
, fsl-> topField, fsl-> npoints, 0.0 ); 

if ( ql- > hexDiff[id4] ==-1.0) 

ql- >hexDiff[id4]=topFieldCompressedDiff(ql- > qtf[fs4- > regionldx], 
fs4-> topField, fs4-> npoints, 0.0 ); 

if ( q4->hexDiff[idl] = = -1.0 ) 

q4- > hexDiff[idl]=topFieldCompressedDiff(q4- > qtf[fsl-> regionldx], 
fsl-> topField, fsl-> npoints, 0.0 ); 

if ( q4- > hexDiff[id4] = = -1 .0 ) 

q4- > hexDiff[id4]=topFieldCompressedDiff(q4- > qtf[fc4- > regionldx], 
fs4-> topField, fs4-> npoints, 0.0 ); 

if (q2->hexDiff[id2] == -1.0) 

q2- > hexDiff[id2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldx] , 

fs2-> topField, fs2-> npoints, 

q2- > featureDiff ? q2- > featureDiff[id2] : 0.0 ); 

if ( q2->hexDifftid3] == -1.0 ) 

q2- > hexDifflid3]=topFieldCompressedDiff(q2- > qtf[fs3- > regionldx], 

fs3-> topField, fs3-> npoints, 

q2- > featureDiff ? q2- > featureDiff[id3] : 0.0 ); 

if (q3->hexDiff[id3] == -1 ) 

q3- > hexDifflid3]=topFieldCompressedDiff(q3- > qtf[fs3- > regionldx], 

fs3-> topField, fs3-> npoints, 

q3- > featureDiff ? q3->featureDiff[id3] : 0.0 ); 

if (q3->hexDiff[id2] == -1 ) 

q3- > hexDiff[id2]=topFieldCompressedDiff(q3- > qtf[fs2- > regionldx], 

fs2- > topField, fs2- > npoints, 

q3- > featureDiff ? q3- > featureDiff[id2] : 0.0 ); 
192 


outPen = ( (fsl-> outsidePenalty + fs2- > outsidePenalty) / 2.0 ) + 
fs2->outsidePenalty + fs3-> outsidePenalty; 
#endif 


#ifdef STDREGION3P 
stdRegion->n_points ); 
stdRegion- > n_points ); 
stdRegion->n_points ); 
stdRegion- >n_points ); 
stdRegion- > n_points ); 
stdRegion- >n_points ); 
stdRegion- > n_points ); 
stdRegion- >n _points ); 


ql->stdDiff[idl] 
q4->stdDiff[idl] 
ql->stdDiff[id4] 
q4->stdDiff[id4] 
q2->stdDiff[id2] 
q2->stdDiff[id3] 
q3->stdDiff[id3] 
q3->stdDiff[id2] 


topFieldDiff(ql->stdFieid, fsl-> stdField, 

topFieldDiff(q4->stdField, fsl-> stdField, 

topFieldDiff(ql- > stdField, fs4- > stdField, 

topFieldDiff(q4- > stdField, fs4- > stdField, 

topFieldDiff(q2- > stdField, fs2- > stdField, 

topFieldDiff(q2- > stdField, fs3- > stdField , 

topFieldDiff(q3- > stdField, fs3- > stdField , 

topFieldDiff(q3- > stdField, fs2- > stdField, 


fprintf(stderr,"# region diffs %6.21f %6.21f %6.21f %6.21f %6.21f %6.21f %6.21f 
%6.21f (idx: %d %d %d %d) out:%6.21An", 

i ql->hexDiff[idl] - ql->stdDiff[idl], 
ql->hexDiff[id4] - ql->stdDiff[id4], 
q4- > hexDiff[idl] - q4- > stdDiff[id 1], 
q4->hexDiff[id4] - q4->stdDiff[id4], 
q2->hexDiff[id2] - q2->stdDiff[id2], 
q2- > hexDiffIid3] - q2- > stdDiff[id3], 
q3- > hexDiff[id3] - q3- > stdDiff[id3], 
q3- > hexDif¥[id2] - q3- > stdDiff[id2], 
fsl->regionIdx, fs4-> regionldx, fs2-> regionldx, 

fs3->regionIdx, outPen ); 
#endif 


+ q3->hexDiff[id3]; 
+ q3->hexDiff[id2]; 


attPen[0] = attPen[l] = 0.0; 

dval[0] = (ql->hexDiff[idl] + q4->hexDiff[id4] ) / 2.0 + q2- > hexDiff[id2] 

dval[l] = (ql->hexDiff[id4] + q4->hexDiff[idl] ) / 2.0 + q2- > hexDiff[id3] 

if ( outPen > 0.0 ) 
{ 

dval[0] + = outPen; 
dval[l] + = outPen; 

} 

if ( q_attachPenFactor > 0.0 ) 
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{ 

attPen[0] = ( computeAttachmentPenalty( ql, fsl, q4, fs4 ) + 
computeAttachmentPenalty(q4, fs4, ql, fsl) ); 

attPen[l] = ( computeAttachmentPenalty( ql, fs4, q4, fsl ) + 
5 computeAttachmentPenalty(q4, fsl, ql, fs4) ); 

dval[0] += attPen[0]; 
dval[l] += attPenjl]; 

} 

10 if ( q_feafureFactor > 0.0 ) 

{ 

dval[0] += ( ql->featureDiff[idl] + q4- > featureDiff[id4] ) / 2.0 + 
q2->featureDiff[id2] + q3->featureDiff[id3]; 

dval[l] += (ql->featureDiff[id4] + q4->featureDiff[idl] ) / 2.0 + 
15 q2->featureDiff[id3] + q3->featureDiff[id2]; 

} 

max3 = 2; 

if ( dval[0] < 0.0 ) 
20 { f 

O #if0 

*f if ( q_debugfp ) 

[H fprintf(q_debugfp, "3 below zero #0 %8.41f %8.41f %8.41f 

Vi %8.41f (%d %d %d %d) \n", 
251} ql->featureDiff[idl] , q4- > featureDiff[id4] 

t q2- > featureDiff[id2] , q3- > featureDiff[id3] , 
£ ■ idl, id4, id2, id3 ); 

w #endif 

L dval[0] = 0.0; 

m } 

T. if (dval[l ; ] < 0.0) 

m i 
g #if0 

i~ if ( q_debugfp ) 

35 fprintf(q_debugfp, "3 below zero #1 %8.41f %8.41f %8.4If 

%8.41f (%d %d %d %d)\n", ; 

ql->featureDiff[id4] , q4->featureDiff[idl] , 

q2->featureDiff[id3] , q3->featureDifftid2], 

id4, idl, id3, id2 ); 

40 #endif 

dval[l] = 0.0; 

} ; 

45 for ( k = 0; k < max3; k+ + ) 

{ 

if ( dval[k] < best ) 

{ ; 
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best = best3 = dvalfk]; 
bestQ = i; 
bestStr = j; 
bestldx = k; 
threelsBetter = 1; 

*r_att_pen = attPen[k] > 0.0 ? sqrt(attPen[k]) : 0.0; 

} 

else if ( dval[k] < best3 ) 
best3 = dval[k]; 

} 

} 

} 


subset steric field comparison 


if ( query- >s2 && str->s3 && q_doSubset ) 
{ 

/* loop over query 2 piece fragments, and compare with the structure's 

3 piece fragments. */ 

for ( i = 0, qs2 = query->s2; i < query- >s2cnt ; i+ + , qs2+ + ) 

{ 

if ( qs2->fragl == -1 j | qs2->frag2 == -1) 
continue; 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
if ( q_partialMatch ) 

{ 

ql->featureDiff = ql->featureSubsetDiff; 
q2->featureDiff = q2->featureSubsetDiff; 

} 

for (j = 0, ss3 = str->s3; ss3 && j < str->s3cnt; j+ +, ss3 + + ) 
{ 

if (ss3-> tragi == -1 || ss3->frag2 == -1 1 1 ss3->frag3 == -1 ) 
continue; 

if ( qs2->subsetMap && qs2->subsetMap[j] = = 0 ) 

continue; /* feature throws this one out */ 

fsl = str-> frags + ss3-> tragi; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str-> frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3r>id; 
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10 


id4 = fs4->id; 

#if 1 

if (ql->hexDiff[idl] == -1.0) 

ql - > hexDiff[id 1 ] =topFieldCompressedDiff (q 1 - > qtf[fs 1 - > regionldx] , 
fsl->topField, fsl-> npoints, 0.0 ); 

if (ql->hexDiff[id2] == -1.0) 

q 1 - > hexDiff[id2] =topFieldCompressedDiff(q 1 - > qtf[fs2- > regionldx] , 
fs2-> topField, fs2- > npoints, 0.0 ); 


if ( q2->hexDiff[idl] = = -1.0 ) 

q2- > hexDiff[id 1] =topFieldCompressedDiff(q2- > qtf[fs 1- > regionldx] , 
fsl-> topField, fsl->npoints, 0.0 ); 

if (q2->hexDiff[id2] == -1.0) 
15 q2- > hexDiff[id2] =topFieldCompressedDiff (q2- > qtf[fs2- > regionldx] , 

fs2->topField, fs2- > npoints, 0.0 ); 

if ( ql->hexDiff[id3] = = -1.0 ) 

ql- > hexDiftlid3] =topFieldCompressedDiff(ql - > qtf[fs3- > regionldx] , 
20 fs3- > topField, fs3- > npoints, 0.0 ); 
S if ( ql- > hexDiff[id4] ==-1.0) 

£ ql - > hexDiff[id4]=topFieldCompressedDiff(ql - > qtf[fs4- > regionldx], 

jj{ fs4-> topField, fs4- > npoints, 0.0 ); 

2jfl if ( q2- > hexDiff[id3] = = -1 .0 ) 

% q2- > hexDiff[id3] =topFieldCompressedDiff(q2- > qtf[fs3- > regionldx] , 

fs3- > topField, fs3- > npoints, 0.0 ); 

if ( q2->hexDiff[id4] = = -1.0 ) 
L q2- > hexDiff[id4] =topFieldCompressedDiff(q2- > qtf[fs4- > regionldx] , 

3G| fs4- > topField, fs4- > npoints, 0.0 ); 
SI #else 

ql->hexDiff[idl] = topFieldCompressedDiff(ql->qtf[fsl-> regionldx], 
rf fsl-> topField, fsl-> npoints, 0.0 ); 
3iT ql->hexDiff[id2] = topFieldCompressedDiff(ql->qtf[fs2-> regionldx], 

fs2-> topField, fs2-> npoints, 0.0 ); 

q2- > hexDiff[id 1] = topFieldCompressedDiff(q2- > qtfffs 1 - > regionldx] , 
fsl-> topField, fsl->npoints, 0.0 ); 

q2- > hexDiff[id2] = topFieldCompressedDiff(q2- > qtf[fs2- > regionldx] , 
40 fs2- > topField, fs2- > npoints, 0.0 ); 

ql->hexDiff[id3] = topFieldCompressedDiff(ql->qtftfs3-> regionldx], 
fs3-> topField, fs3-> npoints, 0.0 ); 

ql->hexDiff[id4] = topFieldCompressedDiff(ql->qtf(fs4-> regionldx], 
fs4-> topField, fs4-> npoints, 0.0 ); 
45 q2->hexDiff[id3] = topFieldCompressedDiff(q2->qtf[fs3-> regionldx], 

fs3-> topField, fs3-> npoints, 0.0 ); 

q2- > hexDiff[id4] = topFieldCompressedDiff(q2- > qtf[fs4- > regionldx] , 


fs4-> topField, fs4-> npoints, 0.0 ); 
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t 


#endif 


if ( q_featureFactor > 0.0 ) 
5 { 

dval[0] = ql->featureDiff[idl] + q2->featureDiff[id2]; 
dvaljl] = ql->featureDiff[id2] + q2->featureDiff[idlj; 
dval[2] = ql->featureDiff[id3] + q2->featureDiff[id4]; 
dval[3] = ql->featureDifflid4] + q2->featureDiff[id3]; 

10 } 

else 

dval[0] = dval[l] = dval[2] = dval[3] = 0.0; 

dval[0] += ql->hexDifflidl] + q2->hexDiff[id2] 

15 dval[l] += ql->hexDiff[id2] + q2->hexDiff[idl] 

dval[2] +;= ql->hexDiff[id3] + q2->hexDiff[id4] 

dval[3] += ql->hexDiff[id4] + q2->hexDiff!id3] 


#ifO 

2Q_ %8.21f\n" 


fprintf(stderr,"%d %d with %d %d Feature; %8.21f %8.21f Steric: %8.21f 


ql-> id,q2-> id, idl, id2, ql->featureDiff[idl], 

^ q2-> featureDiff[id2], ql-> hexDifflidl], q2- > hexDiff[id2] ); 

IK fprintf(stderr,"%d %d with %d %d Feauture: %8.21f %8.21f Steric: %8.21f 

25(J %8.21f\n", ; 

% ql-> id,q2-> id, id2, idl, ql->featureDiff[id2], 

jj q2->featureDiff[idl], ql->hexDiffIid2], q2-> hexDifflidl]); 

%t fprintf(stderr,"%d %d with %d %d Feature: %8.21f %8.21f Steric: %8.21f 

30E %8.21f\n", 

S ql-> id,q2- > id, id3, id4, ql- > featureDiff[id3], 

ffl q2->featureDifflid4], ql->hexDiff[id3], q2- > hexDiff[id4] ); 


fprintf(stderr,"%d %d with %d %d Feature: %8.21f %8.21f Steric: %8.21f 
35 %8.21f\n", 

ql-> id,q2-> id, id4, id3, ql- > featureDiff[id4], 
q2->featureDiff[id3], ql->hexDiffIid4], q2->hexDiff[id3] ); 
#endif 

40 

hevCnts[0] = hevCnts[l] = fsl->hevCnt + fs2->hevCnt; 
hevCnts[2] = hevCnts[3] = fs3->hevCnt + fs4->hevCnt; 

#if0 

fprintf(stderr,"dvals: %8.21f %8.21f %8.21f %8.21f \n", dval[0], dval[l], 

45 dval[2], dval[3] ); 

fprintf(stderr,"hevCnts: %d %d min:%d\n", hevCnts[0], hevCnts[l], 

q_minSubsetSize ); 
#endif 
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max 3 = 4; 


10 


15 


for ( k = 0; k < max3; k++ ) 
{ 

if ( hevCnts[k] > = q_minSubsetSize ) 
{ 

if ( dvalfk] < best ) 
{ 

best = bestsub = dval[k]; 
bestQ = i; 
bestStr = j; 
bestldx = k; 
SublsBetter = 1; 

■ } 

1 else if ( dval[k] < bestsub ) 
bestsub = dval[k]; 


if ( dval[k] < q_bailout && qs2->subsetMap[j] = = 0 ) 
{ 

qs2->subsetMap[j] = 1; 


25J 


} 

} 

} /* end of subset */ 


w #ifdef DEBUGDETAIL 
L if ( debug2 ) 

}=i I* dump array of difference matrix values */ 
fy if ( regid ) 

q fprintf(debug2, ,, %'s\n", regid ); 

Q for ( i = 0; i < query- >numFrags; i++ ) 

{ 

fsl = query- > frags + i; 

dptr = fsl->hexDiff; 

for ( j = 0; j < str->numFrags; j++ ) 

{ 

40 fprintf(debug2,"%7.21f ", *(dptr+j) ); 

} 

fprintf(debug2,"\n n ); 

} 

fprintf(debug2,"\n"); 
45 for ( i = 0; i < query- >numFrags; i++ ) 

{ 

fsl = query- > frags + i; 

for ( j = 0; j < str->numFrags; j + + ) 
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{ 

fs2 = str-> frags + j; 

rprintf(debug2,"%3d,%3d fsl->atomCnt, fs2->atomCnt ); 

} 

fprintf(debug2,"\n"); 

} 

rprintf(debug2,"\n"); 
fprintf(debug2,"Query split 2\n H ); 
for ( i = 0; i < query- >s2cnt; i++ ) 
{ 

qs2 = query- >s2. + i; 

fprintf(debug2,"%d %d\n", qs2->fragl, qs2->frag2 ); 

} 

fprintf(debug2,"\nStr split 2\n"); 
for ( i = 0; i < str->s2cnt; i+ + ) 
{ 

qs2 = str->s2 + i; 

fprintf(debug2,"%d %d\n", qs2->fragl, qs2->frag2 ); 

} 

fprintf(debug2,"\nQuery split 3\n"); 
for ( i = 0; i < query- > s3cnt; i++ ) 

qs3 = query- >s3 + i; 

fprintf(debug2,"%d %d %d\n", qs3->fragl, qs3->frag2, qs3->frag3 ); 

} 

rprintf(debug2,"\nStr split 3\n"); 
for ( i = 0; i < str->s3cnt; i++ ) 
{ 

qs3 = str->s3 +'; i; 

fprintf(debug2,"%d %d %d\n M , qs3->fragl, qs3->frag2, qs3->frag3 ); 

} 

fprintf(debug2," \n"); 

} 

#endif 


#if 0 

fprintf(stderr,"done with this one\n"); 
fflush(stderr); 
#endif ■ 
if ( q_debug^> ) 

fprintf(q_debugfpi "q %d str: %d idx %d 3is %d subis %d best2 %8.41f best3 %8.41f 
bestsub %8.41f \n M , 

bestQ, bestStr, bestldx, threelsBetter, SublsBetter, best2, best3, bestsub ); 
*r_qidx = bestQ; 
*r_sidx = bestStr; \ 
*r_splitidx = bestldx; 
*r_three = threelsBetter; 
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*r_subsethit = SublsBetter; 
if ( best2 < 0.0 ) 

best2 = 0.0; 
r_best2 = sqrt(best2); 
if ( best3 < 0.0 ) 

best3 = 0.0; 
*r_best3 = sqrt(best3); 
*r_bestsub = sqrt(bestsub); 
if ( best < 0.0 ) 

best = 0.0; 
return sqrt(best); 

} 

static int get_details( topresult *res, Split *query, Split *str, 

int bestq, int bestStr, int bestldx, int threeMatched, int subsetHit, int keepCts ) 

{ 

split2 *qs2, *s2; 
split3 *qs3, *s3; 
int ids[3]; 
Frag *f; 
Frag *sf; 

if ( subsetHit ) 
{ 

threeMatched = 0; 

if ( bestq < 0 | | bestq > = query- >s2cnt ) 
return -1; 

if (bestStr < 0 || bestStr >= str->s3cnt) 
return -1; 

qs2 = query- >s2 + bestq; 
s3 = str->s3 + bestStr; 
switch ( bestldx )l 

{ 

#if0 

dval[0] += ql->hexDifflidl] + q2->hexDiff[id2]; 
dval[l] += ql->hexDiff[id2] + q2->hexDiff[idl]; 
dval[2] + = ql->hexDiff[id3] + q2->hexDiff[id4]; 
dval[3] += ql->hexDifflid4] + q2->hexDiff[id3]; 

#endif 

case 0: 

ids[0] = s3->fragl; 
ids[l] = s3->frag2; 
break; 

case 1: 

ids[0] = s3->frag2; 
ids[l] = s3->fragl; 


break; 

case 2: 

ids[0] = s3->frag3; 
ids[l] = s3->frag4; 
break; 

case 3: 

ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
break; 

default: 

return -1; 

} 

f = query- > frags + qs2->fragl; 
sf = str-> frags + ids[0]; 
res->qids[0] = fr>id; 
res->outside[0] = sf-> outside; 
if (f->ct&& sf->ct) 

{ 

res->qFrags[0] = f->ct; 

res->hexDiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 

if ( q_partial Match ) 

f->featureDiff = f- > featureSubsetDiff; 
if ( f->featureDiff ) 

res->featureDiffs[0] = sqrt( f->featureDiff [ ids[0] ] ); 

else 

res->featureDiffs[0] = 0.0; 

} 

else 

{ 

res->hexDiffs[0] = 1.0; 
res->featureDiffs[0] = 1.0; 

} 

if (sf->ct&& keepCts ) 

res->strFrags[0]= makeFragCopy(sf->ct, ids[0], -1 ); 
f = query- > frags + qs2->frag2; 
sf = str-> frags + ids[l]; 
res->qids[l] = f->id; 
res->outside[l] = sf-> outside; 
if (f->ct&& sf->ct) 

{ ; 

res->qFdtgs[l] = f->ct; 

res->hexDiffs[l] = sqrt( f->hexDiff [ ids[l] ] ); 

if ( q_partial Match ) 

f- > featureDiff = f- > featureSubsetDiff; 
if (f->featureDiff ) 
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res->featureDiffs[l] = sqrt( f->featureDiff [ ids[l] ] ); 

else 

rei->featureDiffs[l] = 0.0; 

} I 
else 

{ 

res->hexbiffs[l] = 1.0; 
res->featureDiffs[l] = 1.0; 

} 

if (sf->ct&& keepCts ) 

res->strFrags[l]= makeFragCopy(sf- > ct, ids[l], -1 ); 
res->qids[2] = -1; 
res->strids[0] = ids[0]; 
res->strids[l] = ids[l]; 
res->strids[2] = -1; 

} i 

else if ( threeMatched ) 

{ ■ 

qs3 = query- >s3 + bestq; 
s3 = str->s3 + bestStr; 

switch(bestldx) 

{ 

case 0: • 
case 2: , 

ids[0] = s3->fragl; 

ids[l] = s3->frag2; 

ids[2] = s3->frag3; 

break; 

case 1: 
case 3: 

ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
ids[2] = s3->frag2; 
break; 

case 2: I 

ids[0] = s3->frag2; 

ids[l] = s3->fragl; 

ids[2] = s3->frag3; 

break; 
case 3: \ 

ids[0] = s3->frag2; 

ids[l] = s3->frag3; 

ids[2] = s3->fragl; 

break; 

case 4: 
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ids[0] = s3->frag3; 
ids[l] = s3->frag2; 
ids[2J = s3->fragl; 
break; 

case 5: 

ids[0] = s3->frag3; 
ids[l] = s3->fragl; 
ids[2] = s3->frag2; 
break; 

#endif 

i 

default: ■ 

return -1; 

} 

res->hexDiffs[0] = res->hexDiffs[l] = res- > hexDiffs[2) = 0.0; 

f = query- > frags + qs3->fragl; /* always use the first query fragment for the center 

piece, 

report the corresponding best hit (avg anyway) fragment from the 
structure fragment */ ; 

res->qids[0] = f->id; 

if ( f->ct ) 

{ 

res->qFrags[0] = f->ct; 

res->hexbiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffs[0] = sqrt( f->featureDiff [ ids[0] ] ); 
else ■ 

res->featureDiffs[0] = 0.0; 

} 

f = str-> frags + ids[0]; 
if (f->ct && keepCts ) 

res->strFrags[0]= makeFragCopy(f->ct, ids[0], -1 ); 
res->outside[0] = f-> outside; 

i 

f = query- > frags + qs3->frag2; 
res->qids[l] = f-> id; 
if (f->ct) 

{ ; 

res->qFrags[l] = f->ct; 

res->hexbiffs[l]= sqrt( f->hexDiff [ ids[l] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffs[l]= sqrt( f- > featureDiff [ ids[l] ] ); 
else | 
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res->featureDiffs[0] = 0.0; 

} : 

f = str-> frags -H ids[l]; 
if (f->ct&&keepCts) 

res->strFrags[l] = makeFragCopy(f->ct, ids[l], -1 ); 
res->outside[l] =? f-> outside; 

f = query- > frags + qs3->frag3; 
res->qids[2] = f->id; 
if (f->ct) 
{ 

res->qFrags[2] = f->ct; 

res->hexDiffs[2] = sqrt( f->hexDiff [ ids[2] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffs[2] = sqrt( f->featureDiff [ ids[2] ] ); 
else I 

res->featureDiffs[2] = 0.0; 

} 

f = str-> frags +| ids[2]; 
if (f->ct&& keepCts) 

res->strFrags[2] = makeFragCopy(f->ct, ids[2], -1 ); 
res->outside[2] = f-> outside; 

res->strids[0] = ids[0]; 
res->strids[l] =? ids[l]; 
res->strids[2] = jids[2]; 

/* A 2 piece hit 4 

qs2 = query- >s2 + bestq; 
s2 = str->s2 + bestStr; 

if ( bestldx == 6) 
{ 

ids[0] = s2->fragl; 
ids[l] = s2->frag2; 

} ; 

else ! 
{ 

ids[0] = s2->frag2; 
ids[l] = s2->fragl; 

} 

f = query- > frags + qs2->fragl; 
sf = str-> frags + ids[0]; 
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res->qids[0] = f- > id; 

res- >outside[0] = sf-> outside; 

if (f->ct&& sf->ct) 

{ ; 
5 res->qFrags[0] = f->ct; 

res->hexDiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 
if ( q_partialMatch ) 

f- > featureDiff = f- > feature2PDiff; 
if (f->featureDiff ) 

10 res->featureDiffs[0] = sqrt( f-> featureDiff [ ids[0] ] ); 

else ; . 

res->featureDiffs[0] = 0.0; 

} 

if ( sf->ct&& keepCts ) 
15 res->strErags[0]= makeFragCopy(sf->ct, ids[0], -1 ); 

f = query- >frags| + qs2->frag2; 

sf = str-> frags + ids[l]; 

res->qids[l] = f T > id; 

res->outside[l] =? sf-> outside; 
2(L if (f->ct&& sf->ct) 

S < 

ii res->qFrags[l] = f->ct; 

Jf{ res- > hexbiffs[ 1] = sqrt( f- > hexDiff [ ids[ 1] ] ); 

!5 if ( q_partialMatch ) 

2f| f- > featureDiff = f- > feature2PDiff ; 

j= if (f-> featureDiff ) 

S res->featureDiffs[l] = sqrt( f- > featureDiff [ ids[l] ] ); 

l~ else 

U res->featureDiffs[l] = 0.0; 

3ft= } 

p if ( sf- > ct && keepCts ) 

fij res->strErags[l]= makeFragCopy(sf->ct, ids[l], -1 ); 

p res->qids[2] = -1; 

U res- > strids[0] = [ids[0] ; 

35 res->strids[l] = idsfl]; 

res->strids[2] = r l; 


40 } 

#if 0 


} 

return 0; 


static int debugHits( FILE *fp, Split *query, Split *str, int bestq, int bestStr, int bestldx, int threeMatched 
45 ) 

{ i 
split2 *qs2, *s2; 
split3 *qs3, *s3; . . 
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int ids[3]; 
Frag *f; 
Frag *sf; 


if ( threeMatched ) 

{ \ 

qs3 = query- >s3 + bestq; 
s3 = str->s3 + bestStr; 

switch(bestldx) 

{ 

case 0: 


#if 0 


#endif 


case 2: 


case 1: 
case 3: 


ids[0] = s3->fragl; 
ids[l] = s3->frag2; 
ids[2] = s3->frag3; 
break; 


ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
ids[2] = s3->firag2; 
break; 


case 2: 


case 3: 


case 4: 


case 5: 


ids[0] = s3->frag2; 
ids[l] = s3->fragl; 
ids[2] = s3->frag3; 
break; 

ids[0] = s3->frag2; 
ids[l] = s3->frag3; 
ids[2] = s3->fragl; 
break; 

ids[0] = s3->frag3; 
ids[l] = s3->frag2; 
ids[2] = s3->fragl; 
br^ak; 

ids[0] = s3->frag3; 
id^[l] = s3->fragl; 
ids[2] = s3->frag2; 
break; 


default: 
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return -1; 

> i 

f = query- > frags + qs3->fragl; 

5 if(f->ct) 
{ 

fprintf(fp,"# diff %8.41f \n", sqrt( f->hexDiff [ ids[0] ] ) ); 
if ( bestldx < = 1 ) 

writeCopy(fp, f->ct, qs3->fragl, (int) sqrt( f->hexDiff( ids[0] ]), 

10 "TS_QID" ); 

else 

writeCopy(fp, f->ct, qs3->frag4, (int) sqrt( f->hexDiff[ ids[0] ]), 

"TS_QID" ); 

f = str->frags + ids[0]; 
15 if (f->ct ) 

writeCopy(fp,f->ct, ids[0], -1, ,, TS_SID"); 

} 

f = query- > frags + qs3->frag2; 

if ( f->ct ) :' 
20_ { ) 

^ fprintf(fp,'"# diff % 8.41f \n" , sqrt( f- > hexDiff [ ids[ 1] ] ) ); 

U writeCopy(fp, f->ct, qs3->frag2, (int) sqrt( f->hexDiff[ids[l]] ), "TS QID" 

;H f = str-> frags + ids[l]; 

25!j if (f->ct) 

t wfiteCopy(fp,f- > ct, ids[l], -1 , "TS_SID"); 

} ; 

f = query- > frags + qs3->frag3; 
= if ( f->ct ) 

m i : 

5 fprintf(fp, "# diff % 8.41f \n" , sqrt( f- > hexDiff [ ids[2] ] ) ); 

fjj writeCopy(fp, f- > ct, qs3- > frag3, (int) sqrt( f- > hexDiff[ ids[2] ] ), "TS_QID"); 

?5 f = str->jfrags + ids[2]; 

u if(f->ctl) 
35 wr;iteCopy(fp,f->ct, ids[2], -1, "TS_SID"); 

} I 

} ! 

else i 

{ i 

40 qs2 = query- >s2 + bestq; 

s2 = str->s2 + bestStr; 


w 


if ( bestldx = = 0 ) 

{ ! 
45 ids[0] = s2->fragl; 

ids[l] = s2->frag2; 

} 

else 
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{ 

ids[0] = s2->frag2; 
ids[l] = s2->fragl; 

} : 

f = query- > frags + qs2-> tragi; 
sf = str-> frags + ids[0]; 
if (f->ct&& sf->ct) 
{ 

fprintf(fp,f# diff %8.41f \n", sqrt( f->hexDiff [ ids[0] ] ) ); 

writeCopy(fp, f->ct, qs2->fragl, (int) sqrt( f->hexDiffl ids[0] ] ), "TS_QID" 

); l 

writeCopy(fp, sf->ct, ids[0], -1, "TS_SID n ); 

} 

f = query- > frags + qs2->frag2; 
sf = str-> frags + ids[l]; 
if (f->ct&& sf- >ct) 
{ 

fprintf(fp,"# diff %8.41f \n", sqrt( f->hexDiff [ ids[l] ] ) ); 

writeCopy(fp, f->ct, qs2->frag2, (int) sqrt( f->hexDiff( ids[l] ] ), "TS_QID" 

); ; 

writeCopy(fp, sf->ct, ids[l], -1,"TS_SID" ); 

} : 
} ; 

return 0; 

} 

#endif ] 

static struct CtConnectionTable !*makeFragCopy(struct CtConnectionTable *ct, int id, int hexdiff ) 
{ 

char regName[80] ; 
char *regid; 

struct CtConnectionTable *copyct; 

copyct = DB C^UT^DUP^CTCct, CtCopyKeepAIlAttrs ); 
if ( Icopyct ) 

return copyct; j 
regid = (char *) 0; 

DB_CT_GET_CT_ATTR<[ct, CtCtRegld, &regid ); 
if (hexdiff != -1 ) 

sprintf(regName,"%s_%d_%d", (regid) ? regid : "str\ id+1, hexdiff); 

else 

sprintf(regName,"%s_%d H , (regid) ? regid : "str\ id+1 ); 
DB_CT_SET_CT_NAME_OR_REGID(copyct, CtCtRegld, regName ); 

return copyct; | 


static void setAttr(struct CtConnectionTable *ct, char *name, char *value ) 

i 
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char *tval; 

tval = (char *) 0; i 
5 ' 

DB_CT_GET_CT_ATTR(ct, CtCtUserValue, &tval, name ); 
if (tval) 

DB_CT_UTL_MOD_SIMPLE_CT_ATTR(ct, CtCtUserValue, value, name ); 

else 

10 DB_CT_SET_CT_ATTR(ct, CtCtUserValue, value, name ); 

UTL_ERROR_CLEAR0; 

} 

static void writeCopy(FILE *fp, struct CtConnectionTable *ct, int id, int hexdiff, char *fieldname ) 
15 { 

struct CtConnectionTable *copyct; 
char value[80]; 

copyct = makeFragCopy(ct, id, hexdiff ); 
2<L if(lcopyct) 
*jf return; : 

"Sf if ( fieldname ) 

S { 1 

\^ sprintf(vaIue, M %d", id+ 1 ); 

25^1 setAttr (copyct, fieldname, value ); 

T ) 

Z DB_CT_WRTTE(fp, copyct ); 

DBCTDELETECT(copyct); 


3(K 


} 


%. static int getAtomIds(CtConnectionTable *ct, int al, int *r_a2, int *r_a3 ) 

m i 

« CtAtom *A; 

2 CtAtom *a3; 

35 int i; 

CtAtomBondData *b; 


A = ct-> atoms + al; 
40 *r_a2 = A->bond->toAtom; 

*r_a3 = -1; 

A = ct-> atoms + *r_a2; 

for ( i = 0, b = A->bond; i < A->bondCount; i+ + , b++ ) 
45 { 

if (b->toAtom != al ) 
{ 

a3 = ct-> atoms + b->toAtom; 
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if ( *r_a3 == -1 1 1 a3-> id.atomicNumber != HYDROGEN ) 

*r_a3 = b->toAtom; 
if ( a3-> id.atomicNumber != HYDROGEN ) 

return 0; 

5 } 
} 

return -1; 

} 

/*********************************************************************** 

10 modified from: 

* int SYB_MGEN_CONN_CFA_DIFF( identifier, nargs, args, writer ) * 

* Dick Cramer, Nov. 20, 1996 
* 

Computes difference between two CoMFA fields, represented as text 
15 * strings encoded by the expression generator %cfajiex() 
C function CT_FIELD2HEX0 

*************************************************** 
2(L static double fieldHexDiff( char *cptr, char *cqtr, int nosq ) 

8 < ; 

fy #define pow2(a) ( (a) * (a) ) 

[S static double boundary! 1 6] ;[ 
2§i static double Dist[16][16]; 

i static double DnSq[16][16]; 

g static int InitDist; 

I double xount; 

p int i, j, nch, ptr, qtr; ; J 

3CE char tempString[25]; 

fy if ( !cptr 1 1 Icqtr ) 

□ return 999999.0; 

35 if ( (nch = strlen(cptr)) ! = strlen(cqtr) ) 

return 999999.0; 

/* initialization on 1st call */ 
if (UnitDist) 
40 { 

boundary [0] = 9999.; 
boundary [1] = -0.1 ; 
for(i=2;i< 15;i++) 

boundary [i] = 2*i-3; 
45 boundary[15] = 30.0; 

for (i=0;i<16;i++) 
{ 
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for (j=0;j<16;j + +) 
{ 

DnSq[i][j] = (double) fabs( boundary[i] - boundarylj] ); 
D:st[i]|j] = pow2( boundary[i] - boundarylj]); 

} 

} 

InitDist =1; 

} 

for (xount=0.0, i=0; i<hch; i + = 2, cptr + = 2, cqtr +=2) 
{ 

sscanf( cptr, "%2x", &ptr ); 

sscanf( cqtr, "%2x", &qtr ); 

xount + = nosq ? 
DnSq[ ptr&OxOF ][ qtr & OxOF ] 
+ DnSq[ (ptr & OxFO) > > 4][ (qtr & OxFO) > > 4] 

Dist[ ptr & OxOF ][ qtr & OxOF ] 

+ Dist[ (ptr & OxFO) > > 4][ (qtr & OxFO) > > 4] ; 

} 

return (nosq && xount > 0.0 ) ? xount : sqrt( xount ); 

} ! 

static char *hexStringToInts(char *cptr, int *r_size) 

{ 

int len, i; 
char *arr; 
int idx; 

*r_size = 0; 
if ( !cptr ) 

return (char *) 0; 
len = strlen(cptr); 
arr = malloc(len); 

for ( i = idx = 0; i < len; i+ + , cptr++, idx-f + ) 
{ 

if (*cptr <= '9V) 

arr[idx] = *cptr - '0'; 
else I 

arr[idx] = *cptr - 'a' + 10; 

} 

*r_size = len; 
return arr; 

} 


static double *compressField(double *topfield, int npoints ) 
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static double minv = -0.40; 
static double maxv = 0.40; 
static int nreported; 
static int maxalloc; 
static double *tbuff; ; 
static int ncomp; 
static int tpoints; 
static int newPoints; 
int cnt; 
double *tptr; 
double *cfield; 
int dsize; 
int i; 

double *fptr; 
int needpoints; 
#ifdef NUMBER_OF_COMPRESSION_FIELDS 

double totals[NUMBER_OF_COMPRESSION_FIELDS] ; 
int cnts[NUMBER_OF^COMPRESSION_FIELDS] ; 
int gridsize; j 
int grid; 

gridsize = npoints / NUMBER_OF_COMPRESSION_FIELDS; 
for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS; i+ + ) 
{ 

totals[i] = 0.0; 
cnts[i] = 0; 

} 

#endif 

needpoints = npoints + COMPRESSION J>OINTS; 
if ( needpoints > maxalloc ) 

{ 

if(tbuff) 

free( (char *) tbuff); 
if ( max alloc = = 0 ) 

maxjdloc! = 2000; 
while ( max alloc < needpoints ) 

max alloc *= 2; 
tbuff = (double *) rnalloc(sizeof(double) * max alloc ); 


for ( i = cnt = dsize = COMPRESSION_POINTS, tptr = tbuff + COMPRESSION_POINTS, 
fptr = topfield; i < npoints; i++, fptr++ ) 

{ 

if ( ( *fptr < maxv && *fptr > minv ) && 

(cnt > 0 j | ((i+1) < npoints && *(n?tr+l) < maxv && *(fptr+l) 

> minv ) ) ) 

cnt+ + ; ; 
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i 

1. 


else 

{ 

if ( cnt ) 
{ 


} 

10 else 

{ 


*tptr+ + = (double) (cnt + 100); 
*tptr++ = *fptr; 
dsize + = 2; 
cnt = 0; 


*tptr++ = *fptr; 
dsize+ + ; 


} 

15 #ifdef NUMBER_OF_COMPRESSION_FIELDS 

if ( *fptr > 1.0) 

{ 

grid = i / gridsize; 

if ( grid > = NUMBEROFCOMPRESSIONFIELDS ) 
2<L grid = NUMBER_OF_COMPRESSION_FIELDS - 1; 

crits[grid] +=1; 
totals[grid] + = *fptr * *fptr; 


#endif 


} 


} 


r } 

2: if ( cnt ) 

f { \ ' 

~ n *tptr+ + = (double) (cnt + 100); 

30C dsize+ + ; 

S } 

fil #ifdef NUMBER_OF_COMPRESSION_FIELDS 

5 =5 for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS; i + + ) 

Z i 

35 tbuff[i] = 0.0; 

if ( cnts[i] > 0 ) 

tbuffli] = totals[i] / (double) cnts[i]; 
tbuff[ i + NUMBER_OF_COMPRESSION_FIELDS] = cnts[i]; 

} ; 

40 #endif 

\ 

cfield = (double *) malloc(sizeof(double) * dsize ); 
memcpy((char *) cfield, tbuff, sizeof(double) * dsize ); 

45 #ifO 

if ( nreported < 3 ) 
{ 

ncomp++; 

}' 
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tjpoints += npoints; 
newPoints + = dsize; 

if ( ncomp = = 1000 ) 
5 { f 

fprintf(stderr," compression average for last %d frags: %6.21f %d / %d\n", 
ncomp, 

(double) (newPoints * 100) / (double) tpoints, 
newPoints, tpoints ); 
10 tpoints = newPoints = ncomp = 0; 

nreported+ + ; 

} 
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#endif 
#ifO 


} 


fprintf(stderr," compressed perc: %5.11f new size: %d old size: %d\n", 
(double) (dsize* 100)/(npoints), dsize, npoints ); 
#endif \ 
2(L «f 0 ■ 

fprintf(stderr,"un-compressed\n"); 
for ( i = 0, fptr = topfield; i < npoints; i+ +, fptr+ + ) 
2=! fprintf(stderr,"%6.21f%s", *fptr, ((i+1) % 20) ? " " : "\n" ); 

|2 fprintf(stderr,"\ncompressed:\n"); 
25fJ for ( i = 0, iptr = cfield; i < dsize; i+ +, fptr+ + ) 

t fprintf(stderr,"%6.21f%s", *fptr, ((i+1) % 20) ? " " : "\n" ); 

S: fprintf(stderr,"\n"); 
#endif 

L return cfield; 

Si static double topFieldCompressedDiff(double *start_qry, double *start_str, intnpoints, double startPenalty 

n ) : 

E i j 

35 int i, j, k, minval; 1 \ 

double dval, qval, sval, fijtval; 
int qrySkip, strSkip; f 
int qpoints, spoints; ; 
double *qry, *str; j 
40 #ifdef NUMBER_OFCOMPRESSION_FIELDS 

int distCntl[NUMBER_OF_COMPRESSION_FIELDS]; 
int distCnt2[NUMBER_OF_COMPRESSION_FIELDS]; 
int dist; { 
double avgval; 

45 double avgl[NUMBER_OF_COMPRESSION_FIELDS]; 

double avg2[NUMBER_OF_COMPRESSION_FIELDS]; 

#endif 
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if ( !start_qry 1 1 !start_str 1 1 Inpoints ) 

return 9999.0*9999.0; 
t_fcompare+ + ; 

#ifdef NUMBER_OF_COMPRESSION_FIELDS 
filtval = startPenalty; 

for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS; i++ ) 
{ 

avgl[i] = start Jqry[i]; 

distCntl[i] = statt_qry[i+ NUMBER J3FCOMPRESSIONFIELDS]; 
avg2[i] = startstrfi]; 

distCnt2[i] = start_str[i+NUMBER_OF COMPRESSIONFIELDS]; 

#ifO 

fprintf(stderr, M %d: cnts: %d vs %d avg:%9.31f %9.31f\n", 
i.idistCntUi], distCnt2[i], avgl[i], avg2[i] ); 
#endif [ 


for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS && filtval < q_bailout; i++ ) 

{ .' " 

dist = abs(distCntl[i] - distCnt2[i] ); 

if ( distCntl[i] > distCnt2[i] ) 

{ ; 

dist = distCntl[i] - distCnt2[i]; 
avgval = avgl[i]; 

else 
{ 

dist = distCnt2[i] - distCntl[i]; 
avgval = avg2[i]; 

} 

filtval + = avgval * (double) dist; 


if ( filtval > q_b ailout ) 
return filtval; < 
#endif ' 

i = 0; ! 
sval = 0.0; 

strSkip = qrySkip = 0; [ 

qpoints = spoints = 0; ; 

I 1 
qry = start_qry + COMPRESSION_POINTS; 
str = start_str + COMPRESSIONPOINTS; 
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while ( qpoints < npoints && spoints < npoints && sval < q_bailout ) 

{ 

if ( qrySkip < 0 ) 

qrySkip =f 6; 
5 if ( strSkip < 0 ) 

strSkip =| 0; 

if ( qrySkip = = 0 && *qry > 100.0 ) 
{ 

10 qrySkip = (int) (*qry - 100.0); 

} 

if ( strSkip = = 0 && *str > 100.0 ) 

15 strSkip = ! (int) (*str - 100.0); 

} 

/* Example: 
compressed: Query 

2& 117.00 3.18 3.21 104.00 30.00 30.00 30.00 103.00 30.00 30.00 30.00 1.17 103.00 26.87 
~ 30.00 30.00 5.30 117.00 29.64 4.78 

g 30.00 30.00 0.20 101.00 5.30 30.00 30.00 30.00 30.00 13.90 101.00 30.00 30.00 30.00 
pj 30.00 4.77 102.00 3.72 30.00 30.00 

|S 30.00 1.05 117.00 29.64 5.86 30.00 30.00 0.19 101.00 5.33 30.00 30.00 30.00 30.00 
2 J 13.89 101.00 30.00 30.00 30.Q0 30.00 
i 4.54 102.00 3.61 30.00 27.54 120.00 0.19 3.70 3.84 104.00 30.00 30.00 30.00 103.00 
03 1.13 15.09 3.12 0.25 
I" compressed: Str 

h 122.00 1.76 0.67 105.00 30.00 30.00 1.47 104.00 1.75 0.68 125.00 3.64 21.47 30.00 
3j| 9.03 103.00 30.00 30.00 30.00 26.83 
□ 103.00 3.65 21.46 30.00 9.12 119.00 0.31 8.11 103.00 3.64 19.21 30.00 30.00 103.00 
fU 30.00 30.00 30.00 30.00 0.28 102.00 

O 3.65 19.31 30.00 30.00 119.00 1.44 24.84 2.35 104.00 30.00 30.00 30.00 103.00 15.28 
M= 30.00 30.00 30.00 1.40 103.00 7.38 
35 30.00 0.21 119.00 1.64 3.18 105.00 30.00 30.00 105.00 30.00 30.00 

/ 


if ( strSkip = = 0 && qrySkip = = 0 ) 

40 { ? ■ 


100.0) 


while (spoints < npoints && qpoints < npoints && *str < 100.0 && *qry < 

{ \ 

dyal = (*str - *qry) * autoScaleFactor; 

45 dyal *= dval; 

sval + = dval; 

str+ + ; 
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qpoints++; 
spoints+ + ; 

} 

5 } 

else 

{ 

#ifO 

fprintf(stderr, "start: %d %d %d %d %d %8.21f strldx:%d qryldx:%d\n", 
10 strSkip, qrySkip, spoints, qpoints, npoints, sval, 

(iht) (str - start_str), (int) (qry - start_qry) ); 

#endif 

if ( strSkip > qrySkip ) 

{ { 

15 if ?( qrySkip > 0 ) 

{,; 

I qpoints += qrySkip; 

spoints += qrySkip; 
\ strSkip -= qrySkip; 
2( L [ qrySkip = 0; 

qry+ + ; 

} ! 

!f while (strSkip && qpoints < npoints && *qry < 100.0 ) 

I { 

251: dval = *qry * autoScaleFactor; 

dval *= dval; 

J: | sval + = dval; 

L. strSkip-; 
36| , qpoints++; 

S * spoints + +; 

m i q*7+ + ; 

if ( strSkip = =0) 
35 str+ + ; 

else if ( qrySkip > strSkip ) 

{ f 

if ( strSkip > 0 ) 
40 { , 

qpoints + = strSkip; 
spoints + = strSkip; 
qrySkip -= strSkip; 
t strSkip = 0; 
45 str+ + ; 

} 

while ( qrySkip && spoints < npoints && *str < 100.0 ) 
i\ 
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dval = *str * autoScaleFactor; 
dval *= dval; 
sval += dval; 


} 

else 

{ 


l qrySkip--; 

; qpoints+ + ; 

■[ spoints++; 

. str+ + ; 

if ( qrySkip = = 0 ) 
qry+ + ; 


/* they are the same, what luck 
qpoints += qrySkip; 
spoints += strSkip; 
qrySkip = 0; 
strSkip = 0; 
str++; 
qry+ + ; 


} 

} 5 

} 

Only one of the while loops can process */ 
while ( qpoints < npoints ) 

{ 

if ( *qry < 100.0 ) 
{ 

dval = *qry * autoScaleFactor; 
dval *= dval; 
sval + = dval; 
qpoints+ + ; 

} 

else 

{ 

qrySkip = (int) (*qry - 100.0); 
qpoints + = qrySkip; 

} 

qry+ + ; | 

} 

while ( spoints < npoints ; ) 

{ 

if ( *str < 100.0 ) 
{ 

dval = *str * autoScaleFactor; 
dval *= dval; 
sval + = dval; 
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spoints+ + ; 

} 

else 

{ 

5 strSkip =• (int) (*str - 100.0 ); 

spoints +'= strSkip; 

} ! 

str+ + ; 

} 

10 #if0 

if ( filtval > sval ) 
{ 

fyrintf(stderr," filt higher than actual: %8.41f actual: %8.41f\n", filtval, sval ); 

} 

15 if ( sval > q_bailout ) 

{ 

fprintf(stderr, "ACTUAL more than bailout: % 8. 31f filtval: %8.3lf bail:%8.31f \n", sval, 
filtval, q_b au<out )> ] 

20^ if ( filtval > q_b ailout ) 

[q fprintf(stderr, "compressed field bailout %8.41f actual: %8.41f bailout: %8.41f %s\n", 

fS filtval, sval, q_bailout, 

£j (sval > q_bailout ) ? "WORKED" : "FAILED" ); 

m #endif 
2^ return sval: 

i } 

static double topFieldDiff(double *qry, double *str, int npoints ) 

o { 

3C^ double dval; 

p double sval; ; • 

fU tot i; 

D 

U if ( !qry 1 1 !str | | Inpoints ) 
35 return 9999.0*9999.0; 

for ( i = 0, sval = 0.0; i < npoints; i++ ) 

dval = *qry+ + -r *str++; 
40 dval *= dval; \ 

sval + = dval; 

} 

t_fcompare+ + ; 
45 return sval; 
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static double fieldIntDiff( char cptr, char *cqtr, int si, int s2) 

{ 

static double Dist[16][16]; 
static int InitDist; 
double xount; 
int i; 


if (si != s2 || !cptr || !cqtr) 
10 return 999999.0; \ 

I* initialization on 1st call */ 
if (UnitDist) 
{ 

15 intj; 

double dval; 
double boundary! 16]; 

boundary[0] = 99"99.; 
20^ boundary[l] = -0.1 ; 

fg for (i=2;i< 15;i4-+) 

fS boundaryfi] = 2*i-3; 

n\ boundary[15] = 30.0; 

25g for (i=0;i<16;i++) 

i i 

B for G=0;j <16;j++) 

T { % 

q dval = boundary [i] - boundary [j]; 

3QE Dist[i][j] = dval * dval; 

O } 

fy } 

D InitDist =1; 

H- } 

35 for (xount=0.0, i=0; i < si ; i++, cptr+ + , cqtr++ ) 

{ j 

xount += Dist[*cptr][*cqtr]; 

} 5 
t_f compare + +; 

40 return xount; 

} ; 

#if 0 

static double 2nd_fieldIntDiff( unsigned short *cptr, unsigned short *cqtr, int si, int s2) 
45 { 

#define pow2(a) ( (a) * (a) ) < 
static double boundary[16]; 
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static double Dist[16][16]; 

static double DnSq[16][16]; 

static int InitDist; 

double xount; 
5 double dval; 

int i, j, nch, ptr, qtr; 
char tempString[25]; 

if (si != s2 || !cptr 1 1 !cqtr) 
10 return 999999.0; 


/* initialization on 1st call */ 
if (IlnitDist) 
15 { 

boundary [0] = 9999.; 
boundary! 1] = -0.1 ; 
for(i=2;i< 15;if+) 

boundarylji] = 2*i-3; 
2CL_ boundary[15] = 30.0; 

« for (i=0;i<16;i++) 

SI { 

\Z for G=0;j<16;j++) 

"P dval = boundary[i] - boundary[j]; 

S DnSq[i][j] = (double) fabs( dval ); 

w " Dist[i](j] = dval * dval; 

n > ■ 

30 I } 

□ InitDist =1; 

M } 

□ for (xount=0.0, i=0; i <^ si ; i++, cptr+ + , cqtr++ ) 

s { ; 

35 ptr = (int) *cptr; 

qtr = (int) *cqtr; l 

xount + = Dist[ ptr & OxOF ][ qtr & OxOF ] 
+ Dist[ (ptr & 0xF0) > > 4][ (qtr & OxFO) > > 4]; 

40 t_fcompare++; 

return xount; 

} 


static double fieldIntDiffSq( unsigned short *cptr, unsigned short *cqtr, int si, int s2) 


45 { 


double rval; 

if (si != s2 1 1 !cptr 1 1 !cqtr ) 
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return 999999.0; ■ 
rval = fieldIntDiff( cptr, cqtr, si, s2 ); 
if ( rval < = 0.0 ) { 

return 0.0; * 
return sqrt( rval ); 

} ; 

#endif 


10 int TOP_GET_ST ATS (int dumpRegions, int *r_tfrags, int *r_2compare, int *r_3compare, int 
rfcompare, int *r_filtered, int *r_feat, double *r_outsidePerc ) 

double perc; i 

double tregions; \ 
15 int i; 

*r_tfrags = tot_uniq_frags; 

*r_2compare = t_2compare; 

*r_3 compare = t_3compare; 

*r_fcompare = tfcompare; 
20_ *r_filtered = tjiltered; \ 

*r feat = t featFiltered; \ 

5 if ( t_fields ) 

!S { 

25t! perc = ( (double) tjwtside * 100.0 ) / (double) t_fields; 

*£ *r_outsidePerc =,perc; 

S > . 

L. *r_outsidePerc = 0.0; 

3Ct I 
^ if ( dumpRegions ) I 

ffi ( 

54 for ( i = tregions = 0; i < maxregions; i++ ) 

S { 

35 tregions += regionUseCnts[i]; 

} ; 

if ( tregions ) 

{ ; 

fprintf(stderr, "Region stats: \n H ); 
40 for ( i = 0; i < max_regions; i+ + ) 

fprintf(stderr, " %5.21f " , ( (double) regionUseCntsfi] * 100.0 ) / (double) 


tregions ); 


45 } 
} 


} 


fprintf(stderr,"\n\n"); 


static double computeAttachmentPenalty( Frag *qry, Frag *str, Frag *other_qry, Frag *other_str ) 
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{ 

double *qry_cords; 
double *str_cords; 
double dx, dy, dz; 
double pen; 

if ( !qry-> cords | | !str-> cords ) 

return 0.0; 
pen = 0.0; 

#if 0 

/* 

The query cords and structure cords copyBaseAtom point to the origin, so 

we don't need to compare them, we need to compare the other base atom, where 

it is now. 

Don't need to do this set, it's always zero, the is the atom which is at the origin. 

*/ 

qrycords = qry- > cords + (qry->copyBaseAtom*3); 
strcords = str-> cords + (str->copyBaseAtom*3); 

dx = *qry_cords - *str_cbrds; 

dy = *(qry_cords+l) - *(str_cords+ 1); 

dz = *(qry_cords+2) - *(str_cords+2); 


pen = (dx*dx + dy*dy 4- dz*dz) * q_attachPenFactor; 
#ifdef DEBUGDETAIL 
if ( q_debugfp ) 

fprintf(q_debugfp, "# attach qry: %d str:%d %6.21f %6.21f %6.21f %8.31f (atoms: %d 
%d) (bases: %d %d %d %d)\n M , ' 

qry->id+l, str->id-fl, dx, dy, dz, pen, 
qry- > ct- > atomCount, str- > ct- > atomCount, 

qry- > copyBaseAtom, str- > copyBaseAtom, otherqry- > copyBaseAtom, 
other_str- > copyBaseAtom ); 
#endif 

#endif j 

qry_cords = qry- > cords .+ (other_qry->copyBaseAtom*3); 
str_cords = str- > cords + (other_str->copyBaseAtom*3); 

dx = *qry_cords - *str_cbrds; 

dy = *(qry_cords+l) - *(str_cords+ 1); 

dz = *(qry_cords+2) - *(str_cords+2); 

pen + = (dx*dx + dy*dy + dz*dz) * q_attachPenFactor; 
return pen; 

} 
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static int double_compare(const void *vnrec, const void *vtrec ) 

{ " ! 

double *n = (double *) vnrec; 
double *t = (double *) vtrec; 

5 

return (int) *n - *t; 

} 

static void PartialMatchFeatures(Split *qs, int mode, Frag *ql, Frag *q2, Frag *q3, Frag *q4, Split *str, 
10 Frag *fl, Frag *£2, Frag *f3, Frag *f4, int matchCnt ) 

{ 

double *aa, *da; 
double *either; 
int *both; 
15 double splitDiff; 

int i, cnt; 
int atomCount; 
int fcntl, fcnt2, fcnt3, fcnt4; 
int noFrags; \ 
static Split *last_split; 

if ( !qs 1 1 !qs->ct | | !ql 1 1 !q2 1 1 !str j | !fl j | !f2 | | matchCnt = = 0 j j !qs->featureMask) 
m return; 
25| if ( last_split != qs ) 

% qs->connectedHBCnt = (int *) 0; 

m last_split = qs; • 

atomCount = qs->ct-> atomCount; 
3^ aa = (double *) calloc(sizeof(double), atomCount ); 

p da = (double *) calloc(sizeof(double), atomCount ); 

fy either = (double *) calloc(sizeof(double),atomCount ); 

p both = (int *) calloc(sizeof (int), atomCount ); 

35 for ( i = 0; i < atomCount; i++ ) 

{ 

either[i] = da[i] = aa[i] = -1.0; 

} 

40 

if ( mode = = 2 ) 
{ 

ql- > featureDiff = ql- > feature2PDiff; 
q2->featureDiff ^ q2->feature2PDiff; 
45 if ( q3 ) ; 

q3- > featureDiff = q3- > feature2PDiff; 

if(q4) 

q4-> featureDiff = q3->feature2PDiff; 
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} 

else if ( mode = = 3 ) ' 

{ i 

ql - > featureDiff = q 1 - > feature3PDiff ; 
q2->featureDiff = q2->feature3PDiff; 
if ( q3 ) 

q3- > featureDiff = q3- > feature3PDiff; 

if ( q4 ) 

q4->featiireDiff = q4->feature3PDiff; 

} 

else 

{ * 

ql-> featureDiff = ql->featureSubsetDiff; 
q2-> featureDiff = q2->featureSubsetDiff; 
if ( q3 ) 

q3-> featureDiff = q3- > featureSubsetDiff ; 

if(q4) 

q4-> featureDiff = q4-> featureSubsetDiff; 

} s 

fcntl = fcnt2 = fcnt3 = ;fcnt4 = 0; 

ql->featureDiff[fl->id]i= MeasureClosest(qs, ql, str, fl, da, aa, &fcntl ); 
q2->featureDiff[f2->id] = MeasureClosest(qs, q2, str, f2, da, aa, &fcnt2 ); 
if ( q3 && f3 ) i 

q3->featureDiff[Q->id] = MeasureClosest(qs, q3, str, f3, da, aa, &fcnt3 ); 
if ( q4 &4& f4 ) 

q4->featureDiff[f4->id] = MeasureCIosest(qs, q4, str, f4, da, aa, &fcnt4 ); 


noFrags = 0; 
if (fcntl) 

noFrags+-f ; 
if ( fcnt2 ) 

noFrags+ + ; 
if ( fcnt3 ) 

noFrags+ + ; 5 
if(fcnt4) 

noFrags+ + ; 

for ( i = cnt = 0; i < atomCount; i+ + ) 

{ .i 

if ( dafi] != -1.0 : && ( either[i] == -1.0 1 1 da[i] < either[i] ) ) 

either[i] = da[i], cnt+ + ; 
if ( aa[i] != -1.0 && ( either[i] == -1.0 1 1 aa[i] < either[i] ) ) 

either[i] == aa[i], cnt+ + ; 
if ( da[i] != -1.0 && aa[i] != -1.0 ) 

both[i] = 1; 

} 
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#if 0 i 

fprintf(stderr > "%d %d %d %d frags:%d frag_cnt:%d\n", fcntl, fcnt2, fcnt3, fcnt4, noFrags, 
cnt ); I 
#endif 

5 CoverConnectedHB(qs, qs->ct, either ); 

for ( i = cnt = 0, splitDiff = 0.0; i < atomCount; i+ + ) 
{ 

if (either[i] != -1.0) 
10 { 

if (both[i] = =0) 

aa[cnt] = either [i]; 

else 

aa[cnt] = ( aa[i] + da[i] ) / 2.0; 
15 splitDiff + = aa[cnt]; 

cnt+ + ; / 

} ' 1 

} J 

2(L if ( cnt > matchCnt ) | 
*f qsort( (void *) aa,' (sizet) cnt , (sizet) sizeof(double), 

J: double_cqimpare ); 

12 for ( i = 0, splitDiff = 0.0; i < matchCnt && i < cnt; i + + ) 
25j= splitDiff + = aa[i]; 

f #ifO 

w for ( i = 0; i < cnt; i++ ) 

L fprintf(stderr," %8'.21f aa[i] ); 

3(3rf if ( cnt ) 

fprintf(stderr,"\n"); 

#endif 


CI splitDiff *= q_featureFactor; 

35" if (cnt == 1 ) I 

splitDiff *= 2.0;i /* If there is only one donor or acceptor, increase the weighting 
automatically. Always a good thing. */ 
if ( noFrags > 1 ) 

{ i 

40 splitDiff /= (double) noFrags; 

} 

ql->featureDiff[fl->id] = q2- > featureDiff[f2- > id] = 0.0; 
if ( q3 ) 

45 q3- > featureDiff[f3- > id] = 0.0; 

if(q4) :; 

q4->featureDiff[f4->id] = 0.0; 
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if ( fcntl ) 


ql->featureDiff[fl->id] + = 

splitDiff; 

if ( fcnt2 ) 


q2->featureDiff[f2->id] + = 

splitDiff; 

if ( fcnt3 ) 


q3- > featureDiff[f3- > id] + = 

splitDiff; 

if ( fcnt4 ) 


q4->featureDiff[f4->id] + = 

splitDiff; 

free((char *) aa); 


free((char *) da); 


firee((char *) either); f 


free((char *) both ); 


return; i 



} 

static void CoverConnectedHB(Split *qs, struct CtConnectionTable *ct, double *HB ) 
{ 

CtAtom *A; 

CtAtomBondData *bond;? 
int queryMask; ' 
int aHB; 

int i, j, k, idx, cnt, coverCnt; 
int *Worse; 

aHB = FeatureHBA | FeatureHBD; 
if ( !qs->connectedHBCrit ) 

{ 

qs->connectedHBCnt = (int *) calloc(sizeof(int), ct- > atomCount ); 
qs->connectedHB Atoms = (int *) cal!oc(sizeof(int), ct-> atomCount * 5 ); 
qs->connectedHBTotalCnt = 0; 

for ( i = 0, A = ct->atoms; i < ct- > atomCount; i+ + , A++ ) 

{ 

queryMask = qs->featureMask [ i ]; 
if ( queryMask & aHB ) 

{ « 

for ( cnt = j = 0, bond = A- > bond; j < A->bondCount && j < 5; 

j + + ,bond++) 

queryMask = qs->featureMask [ bond- > to Atom ]; 
if ( queryMask & aHB ) 

, { 

idx = i*5 + cnt; 
; qs->connectedHBCnt[i] += 1; 

qs->connectedHBAtoms[idx] = bond- > to Atom; 
qs- > connectedHBTotalCnt + + ; 
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35 


} 


cnt+ + ; 

} 

} 

} 

} 

} ; 

if ( qs-> connectedHBTotalCnt = = 0 ) 
return; 

Worse = (int *) calloc(sizeof(int), ct->atomCount ); 


for (j = 1; j < 5; j++ ) 

{ s 

for ( i = 0; i < ct- > atomCount; i+ + ) 

{ 

15 if ( qs- > connectedHBCnt[i] ! = j ) 

continue; 

for ( k = 0; k < qs->connectedHBCnt[i]; k++ ) 
{ 

idx = i*5 + k; 

20== if ( HB[i] > HB[ qs->connectedHBAtoms[idx] ] ) 

{: 

Worse[i] = 1; 

m ) 

i . 1 

t } ; 

m for ( i = 0; i < ct- > atomCount; i+ + ) 

7 { 

n if(Worse[i]) ■> 

3Cg HB[i] = -1.0; 

0 } 

jy free((char *) Worse); 

Q return; 


static double MeasureClosest(Split *qs, Frag *ql, Split *str, Frag *fl, double *da, double *aa, int 
*r_fcnt ) 

{ 

int *qmask; 
40 int *smask; 

int i,j,k; 

double best = 99999.0; i> 
int found = -1; 
double worst; ' 
45 int qid, sid; 

int *qMap, *strMap; 
FeatureType qfeature, strFeature; 
double x,y,z; 
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double distsq; 
double otherDiff = 0.0; 
double *qryCords, *strCords; 
double attFact; 
5 double fieldDiff = 0.0; ; 

double extraDiff = 0.0; 
double featDiff; 
int centAtoms[6]; 
int cidx; 

10 AromSet *qset, *strSet; 

int *covered; 

static int featureCnt[4]; • 

static int *extraFeatureCnt[4]; 

int query HB; 
15 intstrHB; 

int origldx; 


20. 


•rjcnt = 0; 


J featureCnt[0] = featureCht[l] = featureCnt[2] = featureCnt[3] = 0; 

extraFeatureCnt[0] = extraFeatureCntfl] = extraFeatureCnt[2] = extraFeatureCnt[3] = 0; 

51 qmask = qs- > featureMask; 

jj5 smask = str-> featureMask; 

25p qMap = ql->origMapping; 

J strMap = fl->origMapping; 

S if ( !ql-> cords 1 1 !fl-> cords ) 

r { 

r** return otherDiff; )■ 

% } ] 

p covered = (int *) calloc(fl->atomCnt,sizeof(int) ); 

Q #ifdef DEBUG_DETAIL 
u if ( q_debugfp ) 

35 { 

iprintf(q_debugfp, "\n# Feature comparison Query Id: %d Structure Id: %d\n", 
ql->id + 1, fl->id + 1 ); 

} | 

#endif 

40 \ 

I* do the single atom features first */ 
for ( i = 0; i < ql->atomCnt ; i+ + ) 

{ 1 

if ( qmask[ qMapfi] ] = = FeatureNone ) 
45 continue; /* no single atom feature at this atom */ 

origldx = qMap[i]; 

qfeature = qmask[qMap[i]]; 
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for(k = 0; k <U; k+ + ) 
{ 

if ( !( qfeature & fMasks[k] ) ) 

continue; 
best =, 999999.0; 
found = -jl; 

worst = (double) featureWeights[k+l] * featureWeights[k+l]; 

for ( j = 0; j < fl->atomCnt; j+ + ) 

{ 

if ( !( smask[ strMap[j] ] & fMasks[k] ) ) 

continue; 
strFeature = smask[ strMaplj] ]; 

#if0 

/* don't 

count attachment features in core mode */ 

if;( q_coremode && ( strMap[j] == fl->copyBaseAtom j ] strMap[j] 

= = str2ndAttach ) ) 
#endif 


#ifdef DEBUG DETAIL 


[ continue; 

qryCords = ql-> cords + (i*3); 

strCords = fl-> cords + 0*3); 

x }= *qryCords - *strCords; 

y r= *(qryCords+l) - *(strCords+ 1); 

z<= *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 

if '( distsq < best ) 

{ 

best = distsq; 
r found = j; 


40 


if '( q_debugfp ) 

fprintf(q_debugfp, "# feature compare: %d %d type:%d 
distance: %7.41f best: %7.41f from: %d. %d\n", 

j j + l,k+l, sqrt(distsq), best, ql->id+l, fl->id+l 

); 

#endif 

} { 

if (found^!= -1 ) 

co'vered[found] | = fMasks[k]; 


45 


a squared */ 


attFact = '1.0; 
if ( best ^ 0.25 ) 


/* More than 0.5, this causes a penalty, best is 


{ 


if ( ql->At\Vts) 

{' 

if ( fl-> AtWts && found ! = -1 ) 
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{ attFact = ( ql- > AtWts[i] + fl- > AtWts[found] ) / 2.0; 

else 

attFact = ql->AtWts[i]; 

} 

else if ( fl-> AtWts && found ! = -1 ) 
attFact = fl->AtWts[found]; 

if ( best > 3.0625 ) /* worst case distance is greater than 1.75 perfect 
mismatch (see GOLD/GASP papers) */ 

{ 

featDiff = worst * attFact; 
fieldDiff + = featDiff; 

} 

else 


m #if 0 


30i #endif 


#if0 


#endif 


featDiff = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
fieldDiff += featDiff; 


} 

else 

{ i 

featDiff = 0.0; 

} 

if ( qfeature & FeatureHBA ) 
{ 

fprintf(stderr,"HBA %d %d, origldx: %8.21f featDiff: %8.2lf\n", 
i, origldx, aa[origIdx], featDiff ); 

if ( aa[orig!dx] == -1.0 |j aa[orig!dx] > featDiff) 


it 


} 


aa[origIdx] = featDiff; 
*r_fcnt +=1; 


} 

if ( qfeature & FeatureHBD ) 
{ 

fprintf(stderr,"HBD %d %d, origldx: %8.21f featDiff: % 8. 21f\n", 
i, origldx, da[origIdx], featDiff ); 

if ( da[origIdx] == -1.0 1 1 da[origIdx] > featDiff) 

da[origIdx] = featDiff; 
*r_fcnt +=1; 

} 
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if ( qfeature & FeaturePos 1 1 qfeature & FeatureNeg ) 
otherDiff + = featDiff; 

#ifdef DEBUGDETAIL 

if ( q_debugfp ) 
{ 

fprintf(q_debugfp, 

"# feature q:%d s:%d ftype:%d best: %7.41f a:%5.31f 

worst: %11.21f FieldDiff: %9.31f\n", 

i+1, found, qmask[ qMap[i] ], best, attFact, sqrt(worst), 

fieldDiff); 

} 

#endif 

} 

} 

/* Now for the extra feature penalty, count all non-covered features */ 
for ( j = 0; j < fl-> atomCnt; j + + ) 

< i 

if ( smask[ strMap[j] ] ! = FeatureNone ) 

{ 


#if 0 

str2ndAttach ) ) 
#endif 


: 1.0); 


#ifdef DEBUG DETAIL 


worst: % 11. 21f FieldDiff: %9.31f)n", 


if ( q_coremode && ( strMaplj] = = str->copyBaseAtom | j strMap(j] = = 

; continue; 

strFeature? = smask[ strMap[j] ]; 
for ( k = 0; k < 4; k+ + ) 

{ 

if ( !( strFeature & fMasks[k] ) ) 

continue; 
if ( !( covered[j] & fMasks[k] ) ) 
{ 

worst = featureWeights[k+ 1] * ( (fl-> AtWts) ? fl-> AtWtslj] 

' featDiff = (worst * worst * q_extraFeatureFactor ); 
otherDiff + = featDiff; 
extraFeatureCnt[k] +=1; 


45 #endif 


if ( q_debugfp ) 

fprintf(q_debugfp, "# missing feature %d,%d %d 

fl->id+l, j+1, 

smask[ strMaplj] ], worst, fieldDiff); 
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} ; 

free((char ) covered ); I 


5 /* end of single atom, now do the aromatic rings */ 

/* Find the 5 and 6 membered aromatic rings in the fragments, setup centroids for quick 
comparisons */ 

10 if ( ql-> aromCnt == -1 ) 

{ ? 
attFact =1.0; 

ql-> aromCnt = 0; 

for ( i = 0, qset = qs->aromSets; i < qs->numArom; i+ +, qset+ + ) 
15 { 

for ( k = cidx = 0; cidx < 6 && k < ql->atomCnt; k+ + ) 
ifi( qset->atoms[ qMapfk] ] ) 

{ 

20^ .* if (ql->AtWts) 

attFact = ql->AtWts[k]; 

m w e ' se 

fu attFact = 1.0; 

m i centAtoms[cidx] = k; 

25|=j ' cidx+ + ; 

I } 

m .) 

s if ( qset-> numAtoms && qset-> numAtoms = = cidx ) 

30g if ( !computeCentroid(ql-> cords, centAtoms, cidx, &x, &y, &z ) ) 

p addCentroid(ql, cidx, attFact, x, y, z ); 

C } * 

35 if ( f 1- > aromCnt = = -1 ; ) 

{ 

fl-> aromCnt = 0; 
attFact = J.O; 

for ( i = 0, strSet = str->aromSets; i < str- > numArom; i+ + , strSet++ ) 
40 { 

for ( k = cidx = 0; cidx < 6 && k < fl->atomCnt; k+ + ) 
{ 

if ( strSet->atoms[ strMapfk] ] ) 
{ 

45 . if (fl->AtWts ) 

attFact = fl->AtWts[k]; 
centAtoms [cidx] = k; 
cidx + +; 
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} 

} 

if ( strSet->numAtoms = = cidx ) 

{ 

if ( !computeCentroid(fl-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(fl, cidx, attFact, x, y, z ); 

} 


} 


/* compare the query aromatic rings verses the structure's aromatic rings */ 
for ( i = 0; i < ql->aromCnt; i++ ) 

{ 

best = 99999.0; 
15 found = 0; 

qryCords = ql->cent + (i*4); 
attFact = 1.0; 
worst = 20.0 * 20.0; 
for(j = 0;j < fl->aromCnt; j++ ) 
2Qh { 
J strCords = fl->cent + 0*4); 

ffi x = *qryCords - *strCords; 

y = *(qryCords+ 1) - *(strCords+ 1); 
jS z = *(qryCords+2) - *(strCords+2); 

25g distsq = x*x + y*y + z*z; 

i if ( distsq < best ) 

T found = j + 1; 

q best = distsq; 

30> attFact = *(qryCords+3) * *(strCords+3); 

6 } 
py #ifdef DEBUG_DETAIL 

O if ( q_debugfp ) 

M= fprintf(q_debugfp, "# arom centroid dist: %8.31f from: %d.%d \n", 

35 . sqrt(distsq), ql->id+l, fl->id+l ); 

#endif 

} 

if ( best > 0.25 ),, 

{ ■ 

40 if ( best >; 3.0625 ) /* worst case distance is greater than 1 .75 perfect mismatch 

(see GOLD/GASP papers) */ 

featDiff = worst * attFact; 

else 

featDiff = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
45 otherDiff+ = featDiff; 

} I 
#ifdef DEBUG_DETAIL f 
if ( q_debugfp ) ' 
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fprintf(qiiebugfp arom centroid q:%d,%d s:%d best:%8.31f fieldDiff: 

%8.41f\n", 

ql->id+l, i, fl->id+l, 
best, fieldDiff ); 

lendif 

} 

worst = featureWeights[0]; 
worst *= worst; 

/* add in penalty for extra aromatic rings in the structure not in the query */ 
if ( fl->aromCnt > ql->aromCnt ) 

otherDiff + = worst * 0.1 * (double) (fl->aromCnt - ql->aromCnt) ; 

#ifdef DEBUGDETAIL 
if ( q_debugfp ) 

fprintf(q_debugfp; "# arom Counts: query : %d structure : %d %s\n", 
ql->aromCnt, fl->aromCnt, 

(ql->aromCnt && ql->aromCnt = = 0 ) ? "Missing some rings" : "" ); 

} 

#endif 

return otherDiff * q_featureFactor; 

} 

static double compareFeatures(Split *qs, Frag *qry, Split *ss, Frag *str, int qry2ndAttach, int 
str2ndAttach ) 

{ | 
int *qmask; 

int *smask; j 
int i ,j,k; i 
double best = 99999.0; ( 
int found = -1; 
double worst; 
int qid, sid; ■ 
int *qMap, *strMap; 
FeatureType qfeature, strFeature; 
double x,y,z; 
double distsq; 

double *qry Cords, *strCords; 
double attFact; 
double fieldDiff = 0.0; 
double extraDiff = 0.0; \ 
int centAtoms[6]; 
int cidx; 

AromSet *qset, *strSet; ; 
int *covered; ! 

static double featureContributions[4][MAX_FEATURES]; /* maximum of 200 features per type 
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should be more than enough, for the above 4 features */ 
static int featureCnt[4]; 
static int extraFeatureCnt[4]; 
int fidx; 

featureCnt[0] = featureCnt[l] = featureCnt[2] = featureCnt[3] = 0; 

extraFeatureCnt[0] = extraFeatureCnt[l] = extraFeatureCnt[2] = extraFeatureCnt[3] = 0 

qmask = qs- > featureMask; 

smask = ss-> featureMask; 

qMap = qry->origMapping; 

strMap = str->origMapping; 

if ( !qry-> cords 1 1 !str-> cords ) 

{ 

fprintf(stderr, "no coords: %d %d\n", qry->cords, str->cords); 
return 9999.0 * 9999.0; 

} 

covered = (int *) calloc(str->atomCnt,sizeof(int) ); 

#ifdef DEBUGDETAIL 
if ( qjtebugfp ) 
{ 

fprintf(q_debugfp, "\n# Feature comparison Query Id: %d Structure Id: %d\n", 
qiiy->id + 1, str- > id + 1 ); 

} ; 

#endif 


/* do the single atom features first */ 
for ( i = 0; i < qry->atomCnt ; i+ + ) 
{ 

if ( qmask[ qMap[i] ] = = FeatureNone ) 

continue; /* no single atom feature at this atom */ 

1- 

qfeature = qmask[qMap[i]]; 
for (k = 0; k < 4; k++ ) 
{ 

if ( !( qfeature & fMasksfk] ) ) 

continue; 
fidx = featureCntfk]; 
best = 99J9999.0; 
found = -1; 

worst = (double) featureWeights[k+l] * featureWeights[k+l]; 
for(j = 6;j < str->atomCnt; j++ ) 

{ I 

if |( !( smask[ strMapO] ] & fMasksfk] ) ) 
continue; 
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1 /* don't 

count attachment features in core mode */ 

if ( q_coremode && (strMap[j] = = str- > copyBaseAtom 1 1 strMaplj] 
= = str2ndAttach ) ) { 
5 ■ continue; 

qry Cords = qry-> cords + (i*3); 
strCords = str- > cords + (j*3); 
x = *qry Cords - *strCords; 
y = *(qryCords+l) - *(strCords+l); 
10 z = *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 
if ( distsq < best ) 

{ 

best = distsq; 

15 found = j; 


#ifdef DEBUG DETAIL 


} 


if ;( q_debugfp ) 

fprintf(q_debugfp, "# feature compare: %d %d type:%d 
2CL. distance: %7.41f best: %7.41f from:%d.%d\n", 

} i+1, j+1, k+1, sqrt(distsq), best, qry->id+l, 


str->id+l ); 
#endif 


2jJ if (found != -1 ) 


} 

( four 

covered[found] ] = fMasks[k]; 


attFact =,1.0; 

if ( best > 0.25 ) /* More than 0.5, this causes a penalty, best is 


3g a squared */ 


{ 

ry if (qry->AtWts ) 

2 if ( str-> AtWts && found ! = -1 ) 

35 attFact = ( qry- > AtWts[i] + str- > AtWts[found] ) / 2 .0; 

else 

attFact = qry->AtWts[i]; 

}'= 

else if ( str-> AtWts && found ! = -1 ) 
40 attFact = str-> AtWts[found]; 

if ( best > 3.0625 ) /* worst case distance is greater than 1.75 perfect 
mismatch (see GOLD/GASP papers) */ 

{ 

45 fieldDiff + = worst * attFact; 

featureContributions[k][fidx] = worst * attFact; 

} 

else 
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/ 2.8125); 


fieldDiff + = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
featureContributions[k][fidx] = worst * attFact * ((best - 0.25 ) 


} 

else 

{ 


featureContributions[k][fidx] = 0.0; 


} 

if ( featureCnt[k] < (MAX_FEATURES - 1) ) 

featureCnt[k] + = 1; /* just to avoid core dumps, don't increment if full 

*/ 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

{ 

fprintf(q_debugfp, 

"# feature q:%d s:%d ftype:%d best: %7.41f a:%5.31f 

worst: % 1 1 .21f FieldDiff: %9.31f\n" , 

i+1, found, qmask[ qMap[i] ], best, attFact, sqrt(worst), 

fieldDiff); 

} 

#endif 

} 

} 

/* Now for the extra feature penalty, count all non-covered features */ 
for ( j = 0; j < str->atomCnt; j++ ) 

< ! 

if ( smask[ strMap[j] ] ! = FeatureNone ) 

{ 

if ( q_coremode && ( strMap[j] = = str-> copyBaseAtom j | strMapfj] = = 


str2ndAttach ) ) 


: 10); 


#ifdef DEBUG DETAIL 


continue; 
strFeature = smask[ strMap[j] ]; 
for ( k = 0; k < 4; k+ + ) 

{ 

if ( !( strFeature & fMasks[k] ) ) 

continue; 
if ( !( covered[j] & fMasks[k] ) ) 

{ 

worst = featureWeightsfk + 1 ] * ( (str- > AtWts) ? str- > AtWtslj] 

' fieldDiff + = (worst * worst * q_extraFeatureFactor ); 
extraDiff + = (worst * worst * q_extraFeatureFactor ); 
extraFeatureCntfk] +=1; 


if ( q_debugfp ) 
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fprintf(q_debugfp, "# missing feature %d,%d %d 

worst: % 11. 21f FieldDiff: %9.31f)n", 

str- > id + 1 , j + 1, 

smask[ strMap[j] ], worst, fieldDiff ); 

5 #endif 

} 

} 

} 

} 

10 free((char *) covered ); : 


/* Almost the end of the single atom features. If autoscaling is on for features, let's ignore the 
featureDiff calculated so far 

15 auto scaling for features is NOT based upon hev atom count. It's based upon the number of 

features by type. 
*/ 

if ( q_partialMatch ) 
{ ' 

20 ^ fieldDiff = featureScaling(featureCnt, extraFeatureCnt, (double *) featureContributions, 

y q_partialMatch ); 
5 fieldDiff + = extraDiff; 

| ) 

25^j /* end of single atom, now do the aromatic rings */ 

2; /* Find the 5 and 6 membered aromatic rings in the fragments, setup centroids for quick 

^ comparisons */ 

30*5= if ( qry->aromCnt = = -1 ) 

S { 

J5 attFact = 1.0; 

qry- > aromCnt = 0; 

rT for ( i = 0, qset =f qs->aromSets; i < qs->numArom; i+ + , qset++ ) 

35^ { 

for ( k = cidx = 0; cidx < 6 && k < qry- > atomCnt; k++ ) 

< I 

if ( qset->atoms[ qMap[k] ] ) 

{ 

40 if (qry->At\Vts) 

attFact = qry->AtWts[k]; 

else 

attFact = 1.0; 
i cent Atoms [cidx] = k; 
45 cidx+ + ; 

} 

} 

if ( qset->numAtoms && qset->numAtoms == cidx ) 
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{ 

if ( !computeCentroid(qry-> cords, centAtoms, cidx, &x, &y, &z ) ) 
i addCentroid(qry, cidx, attFact, x, y, z ); 

} 

} 

} 5 
if ( str-> aromCnt ===== -1 ) 

{ ! 

str- > aromCnt = 0; 
attFact =1.0; 

for ( i = 0, strSet = ss->aromSets; i < ss->numArom; i+ + , strSet-f -f ) 

{ 

for ( k = cidx = 0; cidx < 6 && k < str-> atomCnt; k + + ) 
{ 

if ( strSet->atoms[ strMap[k] ] ) 
{ 

if (str->AtWts) 

attFact = str->AtWts[k]; 
centAtoms[cidx] = k; 
\ cidx++; 

}! 

} 

if ( strSet-> numAtoms = = cidx ) 

{ ; 

if ( !computeCentroid(str-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(str, cidx, attFact, x, y, z ); 

} 

} 

} 

/* compare the query aromatic rings verses the structure's aromatic rings */ 
for ( i = 0; i < qry-> aromCnt; i++ ) 

{ 

best = 99999.0; 
found = 0; 

qryCords = qry->cent + (i*4); 

attFact =1.0; > 

worst = 20.0 * 20.0; 

for ( j = 0; j < str-> aromCnt; j++ ) 

< ' j 

strCords == str- > cent + (j*4); 

x = *qryCords - *strCords; 

y = *(qryCords+l) - *(strCords + l); 

z = *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 

if (distsq < best ) 

{ 

found = j + 1; 
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best = distsq; 

attFact = *(qryCords+3) * *(strCords+3); 

} 

#ifdef DEBUG_DETAIL 
5 if ( q_debugfp ) 

fprintf(q_debugfp, "# arom centroid dist: %8.31f from: %d.%d\n", 
sqrt(distsq), qry->id+l, str- > id + 1 ); 

#endif 

} 

10 if ( best > 0.25 ) 

{ 

if ( best > 3.0625 ) /* worst case distance is greater than 1 .75 perfect mismatch 
(see GOLD/GASP papers) */ 

fieldDiff + = worst * attFact; 

15 else 

fieldDiff + = worst * attFact * (( best - 0.25 ) / 2.8125 ); 

} 

#ifdef DEBUG_DETAIL j 
if ( q_debugfp ) ' 

20 fprintf(q_debugfp ,"# arom centroid q:%d,%d s:%d best:%8.31f fieldDiff: 

" %8.41f\n", ;, 
M qry->id+l, i, str- > id 4- 1 , 

S best, fieldDiff ); 

its a . * f 7 

!l #endif 

25^! } 

J worst = 20.0 * 20.0; 

m I* add in penalty for extra aromatic rings in the structure not in the query */ 

]L if ( str->aromCnt > qry- > aromCnt ) 
30^ fieldDiff + = worst * 0. 1 * (double) (str-> aromCnt - qry- > aromCnt) ; 

Jjj #ifdef DEBUGDETAIL S 
% if ( q_debugfp ) 

U { ! 

35 fprintf(qjlebugfp, "#arom Counts: query : %d structure : %d %s\n\ 

qry- > aromCnt, str- > aromCnt, 

(str->arohCnt && qry- > aromCnt = = 0 ) ? "Missing some rings" : ); 

} i 

#endif 


40 


return fieldDiff * q_featur eFactor; 

} 


/* 

45 The data is in FeaturePosj FeatureNeg, FeatureHBA and FeatureHBD order 

*/ I 

static double featureScaling(int *featureCnts, int *extraFeatureCnts, double *featureContributions, int 
nbest ) 

i 
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static double *thebest; 

static int maxBest; 

double lowest, clowest; [ 
5 int cnt, lowidx; ! 

double dval; 

double fieldDiff = 0.0; 

double featDiff; 

double fieldlgnored = 0.0; 
10 double fieldFact; 

int k, idx, j, fidx; 

if ( Ithebest 1 1 nbest > maxBest ) 
{ 

15 if ( thebest ) 

free((char *) thebest ); 
thebest = (double *) malloc(sizeof(double) * nbest ); 
maxBest = nbest; 

} 

20 _ 

y featDiff = 0.0; 

^ for (k = 0; k < 4; k++ ) 

if ( featureCnts[k], = = 0 ) 
25 "J continue; 

1= /*.Find the N lowest contributing features by type. 

J Think of this as partial match feature matching, like Unity's 

flexible searching. 

30 for ( featDiff = 0.0, lowidx = -1, lowest = 999999999.0, cnt = idx = 0; idx < 
21 featureCnts[k]; idx++ ) 

SI { 

q fidx = (k * MAX FEATURES) + idx; 

=~ dval = featureContributions[fidx]; 

35 r featDiff += dval; 

if ( dval < lowest j j cnt < nbest) 

{ 

if ( cnt < nbest ) 

li 

40 if ( dval < lowest ) 

{ 

lowest = dval; 
lowidx = cnt; 

} 

45 thebest[cnt] = dval; 

cnt+ + ; 

else 
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thebest[lowidx] = dval; 
lowest = dval; 

for(j = 0; j < nbest; j+ + ) 
{ 

if ( thebest[j] < lowest ) 

{ 

lowest = thebestjj]; 
lowidx = j; 

} 

} 


} 


} 


/* we are looking at donors and acceptors */ 


if ( cnt > 0 ) i 
{ 

if ( k > 1 ) 
{ 

fieldFact = 2.0 / (double) cnt; /* Mainly to increase the importance 
when only one donor or acceptor exists */ 

if ( fieldFact < 0.9 ) 

fieldFact = 0.9; 
for(j = 0; j < cnt;j++ ) 

{ ; 

fieldDiff + = thebest[j] * fieldFact; 
featDiff -= thebest[j]; 

} 


#if0 
#else 
#endif 

#if0 

thebest[0], featDiff); 
#endif 


fieldFact = (1.0 / ( (double) (cnt+2) * (double) (cnt+ 1) ) ); 

fieldFact = 0.0; 

if '( cnt > 2 ) 

fieldFact *= 0.5; 
fieldDiff += fieldFact * featDiff; 

tprintf(stderr, "field: %8.21f best: %8.21f remain: %8.21f \n", fieldDiff, 


} 

else 

{ 
} 


fieldDiff + = featDiff; 


#if0 


if ( featureCnts[k] > 2 ) 
{ 
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/* so now what do we do about the 5th - Nth fields. 
Should they or shouldn't they contribute */ 
for ( fieldlgnored = 0.0, j = cnt; j < featureCnts[k]; j + + ) 
{ 

fidx = ( k * MAX_FEATURES ) + j; 
fieldlgnored + = featureContribut ions [fidx]; 

}; 

rprintf(stderr, "Field ignored total: %8.21f sqrt is: %8.21f\n", 

fieldlgnored, sqrt(fieldIgnored) ); 
rprintf(stderr,"type: %d cnt: %d k, featureCnts[k] ); 
foir ( j = 0;j < featureCnts[k]; j+ + ) 

{ 

fidx = ( k * MAX_FEATURES ) + j; 
rprintf(stderr,"%7.21f ", featureContributions[fidx] ); 

}, 

fprintf(stderr,"\nBest %d: ", cnt); 
for (j = 0; j < cnt; j++ ) 

fprintf(stderr,"%7.21f thebest[j] ); 
fprinttXstderr/W'); 

} 

#endif 

} 

} 

return fieldDiff; 

} ; 

static int SearchForFeatures(Split *S) 

{ ; 
int aromHit, featureHit; i 
int numFeatures; 
FeaturePattern *fptr; 
int oxygen, nitrogen, sulfur; 
int ring_oxygen, ring_nitrpgen, ring sulfur; 
int nonSingleRingBond; ^ 
CtAtom *atom; 
CtBond *bond; . 
int i j, k; 
int bent; 
int strlnit; 

CtBondTypeDef bondType; 
CtSimpleBondTypeDef simpleTypes; 
struct Srch2Hits *hits; [ 
int nhits, hitidx; 
int atomld; 
int *atoms; 

int nonSingleRingBonds; 
AromSet *aset; 
int alreadyFound; 
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char regid; 

if (IS || !S->ct) \ 
return -1; 

5 

aromHit = featureHit = 6; 

oxygen = nitrogen = sulfur = nonSingleRingBond = 0; 
ringoxygen = ringnitrogen = ringsulfur = 0; 
regid = (char *) 0; 
10 DB_CT_GET_CT_ATTR(S- > ct, CtCtRegld, &regid ); 


fptr = InitFeaturePatterns(&numFeatures); /* it won't re-initialize */ 
15 DB_CT_UTL_FIND_RINGS(S- > ct); 

for ( i = 0, atom = S->ct-> atoms; i < S- > ct- > atomCount; i+ + > atom+ + ) 

{ ; 

if ( atom- > class ! = CtAtomElement ) 
20 continue; ; 

~ if ( atom-> id.atomicNumber = = OXYGEN ) 

{ 

yy oxygen +f; 

j~y if ( AB_IN_RING(atom) ) 

25 'w i ringoxygen + + ; 

t } \ 

=p else if ( atom- > id.atomicNumber = = NITROGEN ) 

m { 
;L nitrogen + + ; 

30 ^ if ( AB_IN_RING(atom) ) 

i rihg_nitrogen + + ; 

else if ( atom- > id.atomicNumber = = SULFUR ) 

ii { ; 

35 : sulfur + +5; 

if ( AB_Iljj_RING(atom) ) 
rihg_sulfur++; 

» 1 

40 for ( i = nonSingleRingBonds = 0, bond = S->ct-> bonds; 

i < S->ct->bondCount && nonSingleRingBonds ==0; 
i+ + , bond+ + )) 


{ 

if ( AB_IN_RING(bond) ) 

45 { : 

if ( bond->simpleBondType == CtSimpleBondTypeNotSimple ) 

{ 

bondType = DB_CT_GET_BOND_TYPE(S-> ct, STD ID(i), &bcnt, 
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j 

AsimpleTypes ); 

if ( bondType ! = CtBondTypeSingle ) 
5 nonSingleRingBonds + + ; 

} 

5 else if ( bond->simpleBondType != CtSimpleBondTypeS ingle ) 

nonSingleRingBonds + + ; 

} 

} 

10 S->numArom = 0; 

S->aromSets = (AromSet *) 0; 

S->featureMask = (int *) calloc(sizeof(int), S->atomCount); 
if ( nonSingleRingBonds j 

S->aromMask = (int *) calloc(sizeof(int), S->atomCount ); 

15 

for ( i = strlnit = 0; i < numFeatures; i+ + , fptr+ + ) 

{ ; 

if ( fptr-> weight \= = 0 ) 

continue; /* think of it as commented out */ 

20^ if ( fptr- > f type = FeatureArom && nonSingleRingBonds = = 0 ) 

4f continue; /* Can't hit the feature aromatic, no non-single ring bonds */ 

S~ if ( q_useFeatureCharges ==0 && ( fptr->f_type == FeaturePos 1 1 fptr->f_type 

:~ = = FeatureNeg ) ) J 
25 p continue; , 

t if ( fptr- > atomicld > 0 ) 

| { ; 

if ( fptr- >, atomicld = = OXYGEN && ( oxygen = = 0 1 1 fytr- > ringlndicator 
L = = 1 && ringoxygen = - 0 ) *J 
30 J continue; 

S if ( fptr- > atomicld == NITROGEN && ( nitrogen == 0 jj 

S] fptr- > ringlndicator = = 1 && ring nitrogen = = 0 ) ) 
S continue; 

2 if ( fytr-> atomicld = = SULFUR && ( sulfur = = 0 1 1 fptr- > ringlndicator 

35 : == 1 && ring_sulfur = = 0 ) ) : 

continue; 

} i 

hits = DB_SRCH2_SEARCH_PATTERN( fptr- > pattern, S->ct, strlnit ); 
strlnit =1; 
40 nhits = 0; !■ 

if (hits) ? 

{ 

nhits = DB_SRCH2_GET_HIT_COUNT(hits); 
if ( Inhits ) 

45 DB_SRCH2_FREE_HITS(hits); 

> f 
if ( Inhits ) j 

continue; j 
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type */ 


atoms = (int *) 0; 

/* get the atoms which matched and store accordingly, depending upon the feature 

i 

if ( fptr->f_type = = FeatureArom ) 


{ 


0); 


#ifdef DEBUGDETAIL 

rule:%d\n", 
#endif 


for ( hitidx = 0; hitidx < nhits; hitidx+ + ) 

{ 1 

atoms = (int *) calloc(S->ct->atomCount, sizeof(int) ); 

/* store the atoms which define the centroid */ 
for ( j = 1 ; j < = fptr->ct->atomCount; j + + ) 

{ 

I atomld = DB_SRCH2_GET_ATOM_MAPPING(j , hits, hitidx, 

if ( latomld) 
{ 

UTL_ERROR_CLEAR0; 
continue; 

} 

5 atomld-; 


if ( q_debugfp ) 

fprintf(q_debugfp, "# feature %s atom:%d ftype:%d 

regid, atomld+l, (int) ^ptr->f_type, i+1 ); 

S->aromMask[atomId] = fptr- > weight; 
atoms[atomId] = fptr- > weight; 


S->aromSets = (AromSet *) DB_CT_UTL_RECALLOC((char *) 
S->aromSets, S->numArom * sizeof( AromSet), 

(S->numArom+l) * sizeof (AromSet) ); 
aset = S->aromSets 4- S-> numArom; 
S->numArom+ + ; 


} 

else 

{ 


aset- > atoms = atoms; 

ask- > numAtoms = fptr->ct->atomCount; 


for (hitidx = 0; hitidx < nhits; hitidx++ ) 

{ I" 

atomld = DB_SRCH2_GET_ATOM_MAPPING(l , hits, hitidx, 0 ); 

if ( latomld ) 

{ 

UTL_ERROR_CLEAR0; 
, DB_SRCH2_FREE_HITS(hits); 
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#ifdef DEBUG DETAIL 


#endif 


continue; 

>/ 

atomld--; /* make it base 0 */ 
if ( q_debugfp ) 

fprintf(q_debugf]p,"#feature%satom:%d ftype:%d rule:%d\n", 
regid, atomld+l, (int) fptr- > f_type, i+1 ); 

S->featureMask[atomId] | = fptr- > f type; 


} 

DB_SRCH2_FREE_HITS(hits); 


15 } 

return 0; 


20_ static FeaturePattern *InitFeaturePatterns(int *r_numPatterns) 

? { ; 

^ static Srch2Control sctrl[l]; 

static int numPatterns; 
| Jf struct CtConnectionTable *ct; 

25^J FeaturePattern *fptr; 

FeaturePattern *fyats; 
i static FeatureSetName currentSet; 

i r 5 

L static FeaturePattern Unityfpatsfl = { 

30^ { FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:Hev:@r }, 

t { FeatureArom, 20, 0, 0, "Hev[l]=[r]Hev-[r]Hev=[r]Hev-[r]Hev=[r]Hev-[r]@l" }, 

51 { FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 

S { FeatureArom, 20, 0, 0, "Hev[l]=[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

i~ { FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

35 r { FeaturePos, 200, 0, 0, "Any[ + ;not=Any*~ Any[-]]" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev:=#Any,N*0](Any)(Any)Any" }, 
{ FeaturePos, 20Q, NITROGEN, 0, "N[not=N*~ Any[-]](Any)(Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 1, 
' , N[l:NOT=N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[f]-N] ,, }, 
40 { FeaturePos, 200, NITROGEN, 0, 

"N[not=N* ~ Any[-],N(= O) ~ 0[r]](= Any)( ~ Any) ~ Any" } , 

{ FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev:=#Any](Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, 
"N[F](Hc)(Hc)C(=N[F]Hc)Any[IS=C*,N*[r](Any[is=H,C])(Any[is=H,C])(Any[is=H,C])]{Hc:H,C 
45 [NOT=C*=#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 0, 
"C[l:F](:N[F]:C(:C(:N:@lHc)An;y)Any)Any{Hc:H|C[NOT=C*=:#Any]} M }, 

{ FeaturePos, 200> NITROGEN, 1, "N[l:t](C):C:N[fKC):C:C:@l" }, 
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{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0 [fJHev]-:Hev[is=C*=:0,S*(=:0)(=:0)]" }, 

{ FeatureNeg, 200, OXYGEN, 0, 
"0[is = 0*H,0*[f]Hev]P(= 0)(0[is = 0*H,0*[fJHev])OHev" } , 
5 { FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P(=0)(OHev)OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, 
"0[is=0*H,0*[f!Hev]P(=0)(0[is=0*H,0*[f]Hev])CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, n O[is=0*H,0*[f]Hev]P(=0)(OHev)CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H]P[f](=0)C" }, 
10 { F e a t u? r e N e g , 200, NITROGEN, 1, 

n Any[is=C[l]:NH:N:N:N:@l,C(:l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l]" }, 

{FeatureHBA, 100, OXYGEN, 0, "0[f]=Any[not=S,P,N(=0[f])~0[f]](Any)Any" }, 
{ FeatureHBA, 100, OXYGEN, 0, n O[f]~Any[is=S,P](Any[not=0])Any[not=0]" }, 
15 { FeatureHBA, 100, NITROGEN, 0, "N[f|(:Any):Any" }, 

{FeatureHBA, 100, NITROGEN, 1, "N[l]H:N[f]:Z:Z:Z:@l{Z:C,N}" }, 
{ FeatureHBA, 100, NITROGEN, 1, M N[l]H:C:N[fl:Any:Any:@l" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[f]C:Any M }, 
{ FeatureHBA, 100, OXYGEN, 0, n O[f]HC[not=C=Any]-:Any" }, 
20 ^ { FeatureHBA, 100, OXYGEN, 0, "0[f](Z)Z{Z:C[not=C=Any]}" }, 

L i { FeatureHBA, lOO, OXYGEN, 1, "0[l:f]-:Z = :Z-:Z= :Z-:@l{Z:Any[is=C,N]}" }, 

^ { FeatureHBA, 100, OXYGEN, 1, 

if: n O[not=0[l]Any[is=C,N]=Anyjis=C,N]Any[is=C,N]=Any[is=C,N]@l](Any)C=Any[is=C,N]"}, 
lz { FeatujreHBA, 100, NITROGEN, 0, 

25 a j "N[f]H(Z)C[not=C=Het;is=C:^ny,CHevN[f](Zz)Zz]{Z:C[not=C=Het] | N[not=NC=Het] 1 0[not= 
t OC=0]|S(=0)=0|H}{Zz:H|NfO|C[not=C:=Hev]}" }, 

£ { FeatulreHBA, 100, NITROGEN, 0, 

w "N[fl(Z)(Z)C[not=C=Het;is=G:Any,CHevN[fl(Zz)Zz]{Z:C[not=C=Het] | N[not=NC=Het] 1 0[not 
% =OC=0]|S(=0)=0}{Zz:H|N|b|C[not=C:=Hev]} n }, 

30 g { FeatureHBA, iq0, OXYGEN, 0, ••0[fKAny[is=H,C])C=0 M }, 

S { FeatureHBA, 1Q0, OXYGEN, 0, n O[f]-:C ~ 0[fj" }, 

Si { FeatureHBA, 100, NITROGEN, 0, "NH=C[not=CN] M }, 

% { FeatureHBA, 1Q0, NITROGEN, 0, "NffK-Hev^Clnot^N]" }, 

£2 { FeatujreHBA, 100, NITROGEN, 0, 

35 : "N[f](= C[is = NC*N,NC*C,NC*H])Hev[is = Hev = 0,Hev= S,C#N,CN( ~ 0[f]) ~ Off]] " } , 

{ FeatureHBA, ldo, OXYGEN, 0, "0[fJ~N(Any)~0[fJ" }, 
{ FeatureHBA, 1Q0, OXYGEN, 0, "0~ Any[is=S,P](~0)~0 " }, 

{ F e a t ujr e H B D , 100, NITROGEN, 0 
40 "N[not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]H~[!type=3]Any" }, 

{ FeatureHBD, 1<|), OXYGEN, 0, "OHAny[not=C=0]" }, 

{ FeatufreHBD, 100, NITROGEN, 0 
"N[fJ (Hev[not = Any = O, Any = S ,G#N,N( ~ 0[fj) ~ 0[f]]) = C" } , 

{ Feat utr eHBD, 100, NITROGEN, 0 
45 M N[fJ(:C[l :not= COH,CSH]):C[npt = COH,CSH] :C:C[not= COH,CSH] : C:@ 1 " } , 

{ Feat ujr eHBD, 100, NITROGEN, 1 
"N[l:f;not=C[l]:N*:N:N:N:@l,!C[l]:N:N*:N:N:@l]:Any:Any:N(Any):Any:@l" }, 

{ FeatujreHBD, 100, NITROGEN, 1 

i 
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"N[l:f;not = C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]:Any[l:not = N]:Any[is = C ) N]:Any[is=C,N]:N 
H:@l" }, 

{ FeatureHBD, 100, NITROGEN, 0, "N[f|(:C(Any[is=0,S]H)):Any:Any:Any" }, 

{ FeatureHBD, 100, NITROGEN, 0, "N[f](:C:C:C(Any[is = 0,S]H)):Any" }, 

{ FeatureHBD, 100, NITROGEN, 0, 

"N[f](Ya)(Ya)Ya{Ya:Any[not=H,C=0,C=N,S(=0)(=0)Any]}" }, 

{ FeatureHBD, 100, OXYGEN, 0, "0[f]~Any[is=S,P](~OH)(~0)" }, 

{ FeatureHBD, 100, SULFUR, 0, 

"S[f]HZ{Z:C[not=C=0]|S[not = S~0]|N[not=N~0] }" }, 
{ FeatureNone, -1, 0, 0, (char *) 0 } 

}; 


static FeaturePattern Unityfpats_WeLike[] = { 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev=[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev = [r]Hev-[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

{ FeaturePos, 200, 0, 0, "Any[+;not=Any*~ Any[-]]" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev: =#Any,N*0](Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, "N[not=N*~Any[-]](Any)(Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 1, 

"N[l:NOT=N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[f]-N]" }, 

{ FeaturePos, 200, NITROGEN, 0, 

" N[not = N* ~ Any [-] ,N( = O) ~ 0[f]]( = Any)( ~ Any) - Any " } , 

{ FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev: =#Any](Any)Any" }, 

{ FeaturePos, 200, NITROGEN, 0, 

•'N[F](Hc)(Hc)C( = N[F]Hc)Any[IS=C*,N*[f](Any[is=H,C])(Any[is=H,C])(Any[is=H,C])]{Hc:H,C 

[NOT=C*=#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 0, 

"C[l:F](:N[F]:C(:C(:N:@lHc)Any)Any)Any{Hc:H|C[NOT=C* = :#Any]} H }, 

{ FeaturePos, 200, NITROGEN, 1, "N[l:f](C):C:N[f](C):C:C:@l" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[fJHev]-:Hev[is=C* = :0,S*(=:0)(=:0)]" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[f]Hev]P(=0)(0[is=0*H,0*[f]Hev])OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[fJHev]P(=0)(OHev)OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is = 0*H,0*[f]Hev]P( = 0)(0[is - 0*H,0*[f]Hev])CHev" } , 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P(=0)(OHev)CHev" }, 
{ FeatureNeg, 200, OXYGEN, 0, "0[is = 0*H]P[f](=0)C" }, 

{ FeatureNeg, 200, NITROGEN, 1, 
"Any[is=C[l]:NH:N:N:N:@l,C[l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l]" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[q=Any[not=S,P,N(=0[q)~0[f]](Any)Any" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[f]~ Any[is=S,P](Any[not=0])Any[not=0]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[f](:Any):Any" }, 
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{ FeatureHBA, 100, NITROGEN, 1, "N[l]H:N[f]:Z:Z:Z:@l{Z:C,N}" }, 
{ FeatureHBA, 100, NITROGEN, 1, "N[l]H:C:N[f]:Any:Any:@l" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[f]C:Any" }, 
{ FeatureHBA, 100, OXYGEN, 0, ' , 0[f]HC[not=C=Any^:Any ,, }, 
5 { FeatureHBA, 100, OXYGEN, 0, "0[f](Z)Z{Z:C[not=C = Any]}" }, 

{ FeatureHBA, 100, OXYGEN, 1, "0[l:f]-:Z = :Z-:Z = :Z-:@l{Z:Any[is=C,N]}" }, 
{ FeatureHBA, 100, OXYGEN, 1 , 
"0[not=0[l]Any[is=C 1 N]=Any[is=C,N]Any[is=C,N]=Any[is=C,N]@l](Any)C = Any[is=C,N]"} ) 
{ FeatureHBA, 0, NITROGEN, 0, 

10 "N[f]H(Z)C[not = C=Het;is = C:Any,CHevN[fl(Zz)Zz]{Z:C[not=C = Het] |N[not = NC=Het] |0[not = 
OC=0]|S(=0)=0|H}{Zz:H|N|0|C[not=C:=Hev]}" }, 

{ FeatureHBA, 0, NITROGEN, 0, 

"N[f](Z)(Z)C[not=C = Het;is=C:Any,CHevN[f](Zz)Zz]{Z:C[not=C=Het]|N[not=NC=Het]|0[not 
=OC = 0]|S(=0) = 0}{Zz:H|N|0|C[not=C:=Hev]}" }, 
15 { FeatureHBA, 0, OXYGEN, 0, n O[f](Any[is=H,C])C=0" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]-:C~0[f]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "NH=C[not=CN]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[f](~Hev)=C[not=CN]" }, 
{ FeatureHBA, 100, NITROGEN, 0, 
20_ "N[f](=C[is=NC*N,NC*C,NC*H])Hev[is=Hev=0,Hev=S,C#N,CN(~0[f])~0[f]]" }, 
4f { FeatureHBA, 100, OXYGEN, 0, "0[f]~N(Any)~0[f]" }, 

g { FeatureHBA, 100, OXYGEN, 0, "0~ Any[is = S,P](~0)~0 " }, 

{ FeatureHBD, 100, NITROGEN, 0, 
25 l iJ "N[not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]H~[!type-3]Any" }, 
t { FeatureHBD, 100, OXYGEN, 0, "OHAny[not=C=0]" }, 

£ { FeatureHBD, 100, NITROGEN, 0, 

iJ4 "N[f](Hev[not=Any=0,Any=S,C#N,N(~0[f])~0[f]])=C" }, 

L { FeatureHBD, 0, NITROGEN, 0, 

30J "N[f](:C[l:not=COH,CSH]):C[not=COH,CSH]:C:C[not=COH,CSH]:C:@l" }, 
^ { FeatureHBD, 0, NITROGEN, 1, 

£j "N[l:f;not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]:Any:Any:N(Any):Any:@l" }, 
g { FeatureHBD, 0, NITROGEN, 1, 

"N[l:f;not = C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]:Any[l:not=N]:Any[is=C,N]:Any[is=C,N]:N 
35' H:@l" }, 

{ FeatureHBD, 0, NITROGEN, 0, "N[f](:C(Any[is=0,S]H)):Any:Any:Any" }, 

{ FeatureHBD, 0, NITROGEN, 0, "N[f](:C:C:C(Any[is=0,S]H)):Any" }, 

{ FeatureHBD, 0, NITROGEN, 0, 

"N[f](Ya)(Ya)Ya{Ya:Any[not=H,C=0,C=N,S(=0)(=0)Any]}" }, 
40 { FeatureHBD, 100, OXYGEN, 0, "0[f]~Any[is=S,P](~OH)(~0)" }, 

{ FeatureHBD, 100, SULFUR, 0, 

"S[f]HZ{Z:C[not=C=0]|S[not=S~0]|N[not=N~0] }" }, 
{ FeatureNone, -1, 0, 0, (char *) 0 } 

}; 

45 

/* 

From Sybyl 6.71/Unity 4.21 $TA_3DB/sln3d_macros.def 

The structure above assumes first atom is the important atom, so right the sin the correct way the first 
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time. 

define:: Donor_Atom[name; target; rules; connection] 

sln=N[not = C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]H-[!type = 3]Any; 
features = 1 ; 

sin = OH Any [not = C = 0] ; 
features = 1 ; 

sln=N[f](Hev[not = Any=0,Any=S,C#N ) N(^0[f])-'0[f]])=C; 
features = 1 ; 

sln=C[l:not=COH,CSH]:N[q:C[not=COH,CSH]:C:C[not = COH,CSH]:C:@l; 
features =2; 

sln=Any[l]:N(Any):Any:N[f;not = C[l]:N*:N:N:N:@l,Ql]:N:N*:N:N:@l]:Any:@l; 
features = 5; 

sln = Any[l:not = N]:Any[is = C,N]:Any[rc^ 
@1]:@1; 

features = 6; 

sln=Any:Any:Any:N[f]:C(Any[is=0,S]H); 
features =4; 

sln=Any:N[f|:C:C:C(Any[is = 0,S]H); 
features =2; 

sln=N[fl(Ya)(Ya)Ya{Ya:Any[not=H,C = 0,C=N,S(=0)(-0)Any]}; 
features=l; 

sin = 0[f] ~ Any [is - S ,P]( ~ 0H)( ~ O); 
features = 1 ; 

sln-S[f]HZ{Z:C[not=C=0]|S[not = S^O]|N[not = N^O]}; 
features=l; 
features = : :name: :_DL_1 , 
enddefine 

define:: Acceptor_Atom[name; target; rules; connection] 


sin = Off] = Any [not = S,P,N( - 0[f\) ~ 0[f]](Any)Any ; 
features=l; 

sin = 0[f] - Any [is = S ,P](Any [not = O]) Any [not = O] ; 
features = 1 ; 
sln=Any:N[f]:Any; 
features =2; 

sln=Z[l]:Z:Z:NH:N[f|:@l{Z:C|N}; 
features =4; 

sln=Any[l]:NH:C:N[f]:Any:@l; 
features =2; 
sln=0[f]C:Any; 
features = 1 ; 

sln=0[f]HC[not = C=Any]-:Any; 
features = 1 ; 
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sin = ZO[f]Z{Z :C[not =C = Any]} ; 
features =2; 

sln=Z[l]-:0[fJ-:Z = :Z-:Z = :@l{Z:Any[is=C,N]}; 
features =2; 

sln=0[not=0[l]Any[is=C ) N]=Any[is=C,N]Any[is=C,^=Any[is=C,N]@l](Any)C=Any[is=C ) N]; 
features = 1 ; 

sln=N[f]H(Z)C[not=C=Het;is=C:Any,CHevN[fJ(Zz)Zz]{Z:C[not=C=Het]|N[not=NC=Het]|0[n 
ot=OC = 0]|S(=0) = 0|H}{Zz:H|N|0|C[not=C:=Hev]}; 
features = 1 ; 

sln = N[f](Z)(Z)C[not=C = Het;is=C:Any,CHevN[fJ(Zz)Zz]{Z:C[not=C = Het]|N[not=NC = Het]jO[ 
not = OC = O] | S( = O) = 0} {Zz: H | N | O | C[not = C: = Hev] } ; 
features = 1 ; 

sln=0[f](Any[is=H,C])C=0; 
features = 1 ; 
sln=0[f]-:C~0[f]; 
features = 1 ; 

sln=NH=C[not=CN]; 
features = 1 ; 

sln=Hev~N[f]=C[not=CN]; 
features =2; 

sln=Hev[is=Hev=0,Hev-S,C#N,CN(~0[fJ)~0[fJ]N[f]=C[is=NC*N,NC*C,NC*H]; 
features =2; 

sln=AnyN(~0[fJ)~0[f]; 
features=3; 

sln=0~Any[is = S,P](~0)~0; 
features=l; 
features = : :name: :_AL_1 , 
end define 


*/ 


static FeaturePattern orig_top_fpats[] = { 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

{ FeaturePos, 200, 0, 0, "Any[+;not=Any*~ Any[-]]" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev:=#Any,N*0](Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, "N[not=N*~ Any[-]](Any)(Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, 

"N[l:NOT=N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[f]-N]" }, 

{ FeaturePos, 200, NITROGEN, 0, 

"N[not=N* ~ Any[-],N(=0) ~ 0[f]]( = Any)( ~ Any) ~ Any" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev:=#Any](Any)Any" }, 
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{ FeaturePos, 200, NITROGEN, 0, 

"N[F](Hc)(Hc)C(=N[F]Hc)Any[IS=C*,N*[f](Any[is=H,C])(Any[is=H,C])(Any[is=H,C])]{Hc:H,C 

[NOT = C*=#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 1 , 

"C[l:F](:N[F]:C(:C(:N:@lHc)Any)Any)Any{Hc:HiC[NOT=C* = :#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 1, "N[l:f](C):C:N[f](C):C:C:@l" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

n O[is=0*H,0*[f]Hev]-:Hev[is = C* = :0,S*(=:0)( = :0)]" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[fIHev]P(=0)(0[is=0*H,0*[flHev])OHev , ' }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is = 0*H,0*[f]Hev]P( = 0)(OHev)OHev ,, }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[f]Hev]P(=0)(0[is=0*H,0*[f]Hev])CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P(=0)(OHev)CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is = 0*H]P[f](=0)C" }, 

{ FeatureNeg, 200, NITROGEN, 1, 

"Any[is=C[l]:NH:N:N:N:@l,C[l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l] n }, 

{ FeatureHBA, 100, OXYGEN, 0, 

"0[is=0*=Any,0(Any)Any,0[f]Any,0[f](H)C=0,0[i]C = 0;not=0* = :-N,0*[!r](Hev)Any=Het]"}, 
{ FeatureHBA, 100, OXYGEN, 0, "0[is=0* = NO,0*N=0,0* = N=0,0:N:0]" }, 
{ FeatureHBA, 100, NITROGEN, 0, 

"N[is=N*(Any)(Any)Any,N*(Any)Any,N*[f](:Any):Any,N*#C,N*[l:f]:C:NH:C:C:@l,N*[l:f]H:C: 

N[f]:C:C:@l,N*[l:r]H:N[fl:C:C:C:@l,N^^ 

f]H:N[f]:N[f]:N[f]:C:@l,N*H=C,N*[f](Any)=C;not=N*(Any)(Any)Any[not=S]:=#0,N*C(=S)N, 
N*(Any)(Any)C( = S)C,N*(Any)(Any)(Any)Any,N*(Any)(Any)C:Hev,N*[f]HC:Hev,N*(Any[is = H,C 
])=C(N(Any[is=H,C])(C))(N(Any[is=H,C])(Any[is=H,C])),N(Any[is=H,C])=C(N*(Any[is=H,C]) 
(C))(N(Any[is-H,C])(Any[is=H ) C])),N(Any[is=H,C])=C(N(Any[is=H,C])(C))(N*(Any[is-H,C])( 
Any[is=H,C])),N*(:Hev)(:Hev):-Hev,N*(=0)0]" }, 

{ FeatureHBA, 100, NITROGEN, 1, "N[1]C[2]:N:C:N:C(:@2)C(=0)NHC=@1" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[is=N*=N=N,N*(=N)=N]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[is = N*(C)=NC]" }, 

{ FeatureHBA, 100, NITROGEN, 0, 
,, N[is=N*(=C)N,N*[not=N*C=Het,N*C:Hev]N=C]" }, 

{ FeatureHBA, 100, SULFUR, 0, 
" S [is = S * [f]HAny , S * [f](Hev)Hev , S * = C(N)(N) ;not = S * Any ~ O] " } , 

{ FeatureHBD, 100, OXYGEN, 0, "OHAny[not=C=0,P,S]" }, 

{ FeatureHBD, 100, NITROGEN, 0, "NH" }, 

{ FeatureHBD, 100, SULFUR, 0, "SH" }, 

{ FeatureHBD, 100, NITROGEN, 1, 
"N[is = N*[l] = CNHC = C@l,N[l]:C:NH:C:C:@l,Nni:fl:N[f]H:C:C:C:@l,N*[l:f]:N[f]H:C:C:N[fl 
:@l,N*[l:fl:N[n:N[flH:C:C:@l,N*[l:l]:C:C:N[fl:N[f]H:@l,N*[l:f]:N[f]H:C:N[f]:C:@ 
:N[f]:N[f]H:C:@l,N*[l:fl:N[f]H:C:N[f]:C:@l,N*[l:f]:C:N[i]H:N:C:@l,N*[l:f]:C:N[f]H:C 
}, 

{ FeatureHBD, 100, NITROGEN, 0, 
"N[not=N*Hev=#:Het,N*0,N*Hev:Hev](Hev)(Hev)Hev" }, 
{ FeatureNone, -1, 0, 0, (char *) 0 } 

}; 
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if ( numPatterns && currentSet = = q_featureSet ) 

{ 

*r_numPatterns = numPatterns; 

if ( q_featureSet = = UseUnityFeatures ) 

return Unity fpats; 
else if ( q_featureSet = = UseTopomerFeatures ) 

return origtopfpats; 

else 

return Unity fpats_WeLike; 

} 

if ( q_featureSet = = UseUnityFeatures ) 

fpats = Unity fpats; 
else if ( q_featureSet = = UseTopomerFeatures ) 

fpats = orig_top_fpats; 

else 

fpats = Unityfpats_WeLike; 

currentSet = q_featureSet; 

memset((char *) sctrl, '\0\ sizeof(Srch2Control) ); 
sctrl- > maxHits = 0; 

sctrl- > searchControl = Srch2NoDuplicates; 
sctrl- > charge = 1; 
sctrl- > isotope = 1; 
sctrl- > stereoSearch = 1; 

for ( numPatterns = 0, fptr = fpats; fptr- > sin != (char *) 0; fptr++, numPatterns + + ) 

{ 

if ( !fptr->ct) 

fptr->ct = DB_IMPORT_SLN(fptr->sln); 
if ( !fptr->ct) 
{ 

UTL_ERROR_CLE AR() ; 

fprintf(stderr, "Problems importing the feature pattern\n%s\n", fptr- > sin ); 
continue; 

} 

if ( !fptr- > pattern && ! DB_SRCH2_OPEN_PATTERN(fptr- > ct, sctrl, &(fptr- > pattern) 

)) 

{ 

UTL_ERROR_CLEAR() ; 
DB_CT_DELETE_CT(fptr- > ct); 
fptr- > pattern = (void *) 0; 

fprintf(stderr, "Problems building search pattern for the feature pattern\n%s\n", 

fptr- > sin ); 

continue; 

} 

} 
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*r_numPatterns = numPatterns; 

if ( q_featureSet = = UseUnityFeatures ) 

return Unity fpats; 
else if ( q_featureSet = - UseTopomerFeatures ) 
5 return orig_top Jpats; 

else 

return Unity fpats JWeLike; 

} 

10 static int computeCentroid( double *cords, int *atoms, int numAtoms, double *r_x, double *r_y, double 
*r_z ) 

{ 

double x, y, z; 
double *cptr; 
15 int i; 

double divfact; 


20 

3 


35 = 


if ( [cords 1 1 latoms | | numAtoms < = 0 | | !r_x 1 1 !r_y | | !r_z ) 
return -1; 


divfact = (double) numAtoms; 
x = y = z = 0,0; 
W for ( i = 0; i < numAtoms; i+ + ) 

25%! cptr = cords + (atoms [i] * 3); 

II x + = *cptr; 

S y + = *(cptr+l); 

^ z + = *(cptr+2); 

U > 

30 p *r_x = x / divfact; 

S *r_y - y / divfact; 

Si *r_z = z / divfact; 

!^ return 0; 


} 


static void addCentroid(Frag *fptr, int natoms, double attFact, double x, double y, double z ) 
{ 

double *cptr; 
double cdiff, xd, yd, zd; 
40 int i; 

int duplicate; 

for ( i = duplicate = 0; [duplicate && i < fptr->aromCnt; i++ ) 
{ 

45 cptr = fptr->cent + (i*4); 

xd = x - *cptr; 

yd = y - *(cptr+ 1); 

zd = z - *(cptr+2); 
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cdiff = xd*xd + yd*yd + zd*zd; 
if ( cdiff < 0.1 ) 

duplicate = 1; 

} 

if ( duplicate ) 

{ 

return; 

} 

fptr- > cent = (double *) DB_CT_UTL_RECALLOC((char *) fptr->cent, 

fptr- > aromCnt * sizeof(double) * 4, 

(fptr- > aromCnt +1) * sizeof (double) * 4 ); 
cptr = fptr- > cent + (fptr- > aromCnt * 4); 
fjptr->aromCnt+ + ; 

*cptr = x; 
*(cptr+l) = y; 
*(cptr+2) = z; 
*(cptr+3) = attFact; 
return; 

} 


static int compareFields(double *orig, double *atombased, int npoints ) 

{ 

int i; 

for ( i = 0; i < npoints; i+ + , orig+ + , atombased+ + ) 

{ 

if ( ( fabs( *orig - *atombased) ) > 0. 1 ) 
{ 

fprintf(stderr, "field difference: %d of %6.31f %6.21f %6.21f\n", 
i, *orig - *atombased, *orig, *atombased ); 

} 

} 

return i; 

} 


/* functions from here to "end of core funcs" are for core searching */ 

int TOP_CORE_QUERY( struct CtConnectionTable *ct, FILE *fp) 
{ 

static Split core_split[l]; 
CtAtom *atom; 
int i; 

int k_atomicid = 19; 
int na_atomicid =11; 
int Kid, Naid; 
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int err = 0; 
int *atomMask; 
int *bl, *b2; 
int hevCnt; 

5 struct CtConnectionTable *dupct; 

Frag *fl, *f2; 

Kid = Naid = -1; 

10 for ( atom = ct-> atoms, i = 0; i < ct->atomCount; i++ , atom+ + ) 

{ 

if ( atom- > class ! = Ct AtomElement ) 
continue; 

if ( atom->id.atomicNumber = = k_atomicid ) 
15 { 

if ( Kid > = 0 ) 

fprintf(stderr,"More than one K atom present in core queryAn"), err 

1; 

Kid = i; 

20_ atom->id.atomicNumber = CARBON; 

^ else if ( atom->id.atomicNumber = = na atomicid ) 

f 

|"M if ( Naid > = 0 ) 

25^ fprintf(stderr,"More than one Na atom present in core queryAn"), err 

Naid = i; 

w atom->id.atomicNumber = CARBON; 

30^ } 
f if ( Kid = - -1 ) 

^ fprintf(stderr, "No K atom present in the core queryAn" ); 

'?2 err = 3; 

35 f ~ } 

if ( Naid == -1 ) 
{ 

fprintf(stderr,"No Na atom present in the core queryAn" ); 
err = 4; 

40 } 

if ( err ) 

return err; 

atom = ct-> atoms + Naid; 
45 stripCharge(ct, atom, Naid); 

atom = ct-> atoms + Kid; 
stripCharge(ct,atom, Kid); 
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bl = (int *) malloc(ct- > atomCount * sizeof(int) ); 
b2 = (int *) malloc(ct- > atomCount * sizeof(int) ); 

for ( i = 0; i < ct- > atomCount; i++ ) 
{ 

bl[i] = b2[i] = 1; 

} 

bl[Kid] = -1; /* mark base atom */ 
b2[Naid] = -1; /* mark base atom */ 


memset((char *) coresplit, '\0\ sizeof(Split) ); 

g_split2 = (split2 *) 0; 
g_split3 = (split3 *) 0; 

g_splitcnt = g_splitalloc = g_sp!it3Cnt = g_split3Alloc = 0; 

atomMask == createAtomMask(ct, q_termFlag, &hevCnt); 
addSplit2(l, bl, b2 ); 

core_split- > frags = createUniqFrags(ct- > atomCount, g_split2, g_splitcnt, g_split3, g__split3Cnt, 
atomMask, 

&(core_split->numFrags) ); 

core_split- > s2 = g_split2; 
core_split- > s2cnt = g_splitcnt; 
coresplit- > bondCount = ct->bondCount; 
coresplit- > atomCount = ct-> atomCount; 
core_split-> atomMask - atomMask; 

g_split2 = (split2 *) 0; 
gsplitcnt = gsplitalloc = 0; 


core_split- > ct = ct; 
SearchForFeatures(core_split) ; 
qmode = 1; 
BuildFrags(coresplit) ; 
BuildTopomers(ct, core split, (Split *) 0); 
qmode = 0; 

if ( core split- > frags && fp ) 
{ 

fl = core_split-> frags; 
f2 = coresplit- > frags + 1; 

dupct = DB_CT_UTL_DUP_CT(fl->ct, CtCopyKeepAllAttrs ); 

259 


atom = dupct- > atoms + Kid; 
atom->id.atomicNumber = katomicid; 
atom = dupct- > atoms + Naid; 
atom- > id.atomicNumber = naatomicid; 
5 setAttr(dupct, "CORESIM", "0"); 

setAttr(dupct, "TS_QID", "0 M ); 
DB_CT_WRITE(fp, dupct); 
DB_CT_DELETE_CT(dupct) ; 

10 dupct = DB_CT_UTL_DUP_CT(f2->ct, CtCopyKeepAllAttrs ); 

atom = dupct- > atoms + Kid; 

atom- > id.atomicNumber = k_atomicid; 

atom = dupct- > atoms + Naid; 

atom- > id.atomicNumber = na_atomicid; 
15 setAttr(dupct, "CORESIM" , "0"); 

setAttr(dupct, "TSQID", "0"); 

DB_CT_WRITE(fp, dupct); 

DB_CT_DELETE_CT(dupct) ; 

20 UTLERRORCLE AR() ; 

5 

03 qs = core_split; 

25 U1 return 0; 


top result *TOP_CORE SEARCH(struct CtConnectionTable *ct, double radius, double max_attachpen, 
30^ int *r_hascore ) 

t { 

Sri Split *S; 

jjj double fl, £2, f3, f4; 

'rj double si, s2, s3, s4; 

35^ double al, a2, a3, a4; 

Frag *ql, *q2; 

Frag *fsl, *fs2; 

split3 *ss3; 

double sval, sval2, sval3, sval4; 
40 int ij; 

double best; 

double best Attach; 

static top result res[l]; 

Frag *bestFrag, *altFrag; 
45 int idx = 0; 

CtAtom *atom, *atm2; 

char value[80]; 

struct CtConnectionTable *dupct; 
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int uniqld, hitld; 

i 

memset((char *) res,, '\0\ sizeof(top result) ); 
q_bailout = radius * radius; 

maxattachpen *= maxattachpen; 

I 

best = 999.9 * 999.9; j 
bestAttach = maxattachpen; 
q_coremode =1; J. 

S = FindBreakPoints(ct, <Lminatoms, qjermFlag, TRUE ); 
*r_hascore = 0; I 
if (!S || S->s3cnt ==0) 

{ 

q_eoremode = 0; 
if (S) 

freeSplit(S); 
return (top result *) 0; 

} 

*r_hascore = S->s3cnt; 

S->ct = ct; ;;| 
SearehForFeatures(S); | 
BuildFrags(S); | 

ql = qs-> frags; jj 
q2 = qs-> frags +1; | 
bestFrag = (Frag *) 0; i 

for ( j = 0, ss3 = S->s3; ss3 && j < S->s3cnt; j + +, ss3 + + ) 

{ \ 

fsl = S-> frags + ss3->fragl; 
fs2 = S-> frags + ss3->frag2; 

if ( fsl-> cords 4= (double *) 0 1 1 fs2-> cords == (double *) 0) 
{ 

continue; 

} 

atom = fsl->ct-i> atoms + fsl->copyBaseAtom; 
atm2 = fsl->ct-i> atoms + fs2->copyBaseAtom; 
if ( atom->bonddiount > 1 1 1 atm2->bondCount > 1 ) 

{ t 

fsl-> cords = fs2-> cords = (double *) 0; 
continue; ,. 

j ; 

if ( q_debugfp ) 
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} 


DB_CT_\VRITE(<Ldebug^,fsl->ct ); 
DB_CT_WRTTE(q_debugfi),fs2->ct ); 
UTL_ERROR_CLEAR0; 


al = computeAttachmentPenalty(ql, fsl, q2, fs2 ); 
a2 = computeAttachmentPenalty(q2, fsl, ql, fs2 ); 
a3 = computeAttachmentPenalty(ql, fs2, q2, fsl ); 
10 a4 = computeAttechmentPenalty(q2, fs2, ql, fsl ); 

if ( al > maxattachpen && a2 > maxattachpen && a3 > maxattachpen && a4 > 
maxattachpen ) 

{ 

fsl-> cords = fs2-> cords = (double *) 0; 
15 continue; 

} 

fl = compareFeatures(qs, ql, S, fsl, q2->copyBaseAtom, fs2- > copyBaseAtom ); 
f2 = compareFeatures(qs, q2, S, fsl, ql-> copyBaseAtom, fs2- > copyBaseAtom ); 
f3 = compareFeatures(qs, ql, S, fs2, q2-> copyBaseAtom, fsl- > copyBaseAtom ); 
2CU f4 = compareFeatures(qs, q2, S, fs2, ql-> copyBaseAtom, fsl- > copyBaseAtom ); 

m sval = f 1 + al; | 

sval2 = f2 + a2;; 
m sval3 = f3 + a3;5 

2$U sva!4 = f4 + a4;1 


q_bailout ) 


if ( sval > q_ba;ilout && sval2 > q_bailout && svaI3 > q_b ailout sva14 > 


q fsl-> cords = fs2-> cords = (double *) 0; 

3QP continue; 

S > i 

ru } : ' ! 

□ BuildTopomers(ct, S, (Split *) 0 ); 

35 for (j = 0, ss3 = S->s3*; ss3 &&j < S->s3cnt; j+ +, ss3 + + ) 

1 I 

fsl = S->frags | ss3->fragl; 
fs2 = S-> frags f ss3->frag2; 

40 if ( fsl- > cords == (double *) 0 1 1 fs2-> cords == (double *) 0) 

continue; 

al = computeAttichmentPenalty(ql, fsl, q2, fs2 ); 
a2 == computeAtt£ichmentPenalty(q2, fsl, ql, fs2 ); 
45 a3 = computeAtt3chmentPenalty(ql, fs2, q2, fsl ); 

a4 = computeAttachmentPenalty(q2, fs2, ql, fsl ); 

fl = compareFeatures(qs, ql, S, fsl, q2-> copyBaseAtom, fs2- > copyBaseAtom ); 
f2 = compareFeatures(qs, q2, S, fsl, ql-> copyBaseAtom, fs2- > copyBaseAtom ); 
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f3 = compareFeatures(qs, ql, S, fs2, q2->copyBaseAtom, fsl->copyBaseAtom ); 
f3 = compareFea|jures(qs, q2, S, fs2, ql-> copy Base Atom, fsl->copyBaseAtom ); 

si = topFieldConiipressedDiff(ql->qtfIfsl->regionIdx], fs l-> topField, fsl- > npoints, 

I 

s2 = topFieldConipressedDiff(q2->qtflfsl->regionIdx], fsl- > topField, fsl->npoints, 

I 

V 

s3 = topFieIdCompressedDiff(ql- > qtf[fs2- > regionldx], fs2- > topField, fs2- > npoints, 
s4 = topFieldCompressedDiff(q2->qtf[fs2-> regionldx], fs2-> topField, fs2-> npoints, 


sval = fl + al 4 si; 

if ( sval < best &&, al < max attachpen ) 

{ 

best = sval; 
res->hexbiffs[0] = si; 
res->featureDiffs[0] = fl; 
res->attaehmentPenalty = al; 
bestFrag = fsl; 
altFrag =|fs2; 
idx = 0; [ 

} I 

sval = f2 + a2 4 f s2; 

if ( sval < best $& a2 < max attachpen ) 

{ i 

best = sval; 
res->hexbiffs[0] = s2; 
res->featiireDiffs[0] = f2; 
res->attachmentPenalty - a2; 
bestFrag = fsl; 
altFrag = fs2; 
idx = 1; { 

1 S 

sval = f3 + a3 -i s3; 

if ( sval < best <S:& a3 < max attachpen ) 

best — sval; 
res->hexbiffs[0] = s3; 
res->featiireDiffs[0] = f3; 
res->atta6hmentPenalty = a3; 
bestFrag =f fs2; 
altFrag =)jfsl; 
idx = 0; I; 


I 263 

I 

| 


t 
! 

J j 

i 

sval = f4 + a4 -4 s4; 

If ( sval < best && a4 < max attachpen ) 

< .'I 

best = sval; 

5 res->hexDiffs[0] = s4; 

res->featureDiffs[0] = f4; 
res->attachmentPenalty = a4; 
bestFrag = fs2; 
altFrag ='fsl; 
10 idx = 1; 

1 

if ( best < q_b ailout ) f 

{ S 
15 if ( best < 0.0)| 

best = 0.6; 

res->comfa_diffj= sqrt(best); 

sprtotf(value,"%d;", (int) res- > comfadiff ); 

setAttr(bestFrag->ct, "CORESIM", value ); 

20U | 
~ sprintf(value,"%d;', (int) sqrt(res->attachmentPenalty) ); 

% setAttr(bestFrag- % ct, "TS ATTACH PEN" , value ); 

ry \ ■ 

j =* sprintf(value, " %d£ , (int) sqrt(res- > featureDiffs[0]) ); 

25^ setAttr(bestFrag- ^ ct, "TSFEATURE" , value ); 

~!Z i 

jg sprintf(value, " %d" , (int) sqrt(res- > hexDiffs[0]) ); 

J" setAttr(bestFrag->ct, "TS_STERIC", value ); 

30j2 sprintf(value, " % d" , idx + 1 ); 

□ setAttr(bestFrag- > ct, "TS_QID " , value ); 

q res- > strFrags[0] h DB_CT_UTL_DUP_CT(S- > ct, CtCopyKeepAll Attrs ); 

u res->strFrags[l] ,|= DB_CT_UTL_DUP_CT(bestFrag- > ct, CtCopyKeepAHAttrs ); 

35 dupct = res->strFrags[l]; 

I 

if (idx == 1) I 

{ ! 

atom == diipct-> atoms + bestFrag- > copyBaseAtom; 
40 atom->idlatomicNumber = 11; 

stripCharge(dupct, atom, bestFrag- > copyBaseAtom ); 
atom = dupct- > atoms + altFrag- > copyBaseAtom; 
atom->id!atomicNumber = 19; 
stripCharge(dupct, atom, altFrag- > copyBaseAtom); 

45 } 

else 


{ 


atom = dupct- > atoms + bestFrag- > copyBaseAtom; 
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atom->idtatomicNumber = 19; 
stripCharge(dupct, atom, bestFrag->copyBaseAtom); 
atom = diipct- > atoms + altFrag-> copy Base Atom; 
atom->id.atomicNumber = 11; 
stripCharge(dupct, atom, altFrag->copyBaseAtom); 

} ; 

dupCheckCore(dupct, &uniqld, &hitld ); 

sprintf(value,"%d!', uniqld); 
setAttrCdupct/TSiUNIQJD", value ); 

sprintf(value,"%d:'\ hifld); 
setAttr(dupct,"TSlHIT_ID", value ); 

freeSplit(S); ! 

q_coremode = 0;| 
return res; | 

} i 

q_coremode = 0; f 
freeSplit(S); 

return (top jresult *) 0; : 

} " ] 

static void stripCharge(struct CtConnectionTable *ct, CtAtom *aptr, int atomidx) 

{ • i 

int relop, charge; 

if ( aptr->attributeMask & CtAtomFormalCharge ) 

{ ■$ 

charge = 0; 

if (DBj:i^GET>NY_ATOM_ATTR(ct, atomidx + 1, CtAtomFormalCharge, &charge, 

&relop ) ) 

{ 

if ( charge > 0 ) 

DB_CT_UTL_SUB_ANY_ATOM_ATTR(ct, atomidx + 1 , 
CtAtomFormalCharge ); f 

'} * 

UTL_ERROR_CLEAR0; 

} 

} 

static int dupCheckCore(struct CtConnectionTable *ct, int *r_uniqid, int *r_hitid ) 

{ ■;] 
static UniqSln *uniqSlns; 

static int uniqAlloc; | 

static int uniqCnt; ■ \ 

UniqSln *uptr; 
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10 


301 ); 


} 


inti; J 
struct CtConnectionTable'*dupct; 
char *sln; 
unsigned int crc; 

dupct = DB_CT_UTL_DUP_CT(ct, CtCopyKeepAttrs ); 
DB_CT_UNIQ(dupct); 

sin = DB_CT_SLN_GENERATE_NOATTR(dupct, (int **) 0); 
crc = DB_CT_HOLO_GEN_CRC(sln); 

DB_CT_DELETE_CT(dupct); 


for ( i = 0, uptr = uniqSlns; i < uniqCnt; i+ + , uptr+4- ) 
{ 

15 if ( uptr- > crc = = crc && !strcmp(uptr-> sin, sin ) ) 

{ 

uptr- > hitcnt 4- + ; 
*r_uniqid-= i+1; 
*r_hitid 4 uptr->hitcnt; 
20^ UTL_MEM_FREE(sln); 
% return uptr- > hitcnt; 

! » if ( uniqCnt > = uniqAlloc ) 

j= if ( uniqSlns ) 


{ 


} 

else 


uniqAlloc; *= 2; 

uniqSlns §= (UniqSln *) realloc((char *) uniqSlns, uniqAlloc * sizeof(UniqSln) 


uniqAlloc = 100; 

35 uniqSlns = (UniqSln *) malloc(uniqAlloc * sizeof(UniqSln) ); 

} i 

uptr = uniqSlns + uniqCnt; 
uptr- > sin = sin; | 
40 uptr- > crc = crc; 

uptr- > hitcnt =1; | 
uniqCnt++; 

*r_uniqid = uniqCnt; f 
45 *r_hitid = uptr- > hitcnt; 


return 0; 
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int *TOP_MATRIX_SEARCH(char *slns, int numSlns ) 
{ 

int i,j; 

int *matrix; 
5 int offset; 

struct CtConnectionTable *ct; 

struct CtConnectionTable "largest; 

Split **splits; 

Split *S; 
10 Split *QS; 

double *cord; 

int natoms; 

Frag*fptr; 

double comfa_diff; 
15 double radius; 

int nParts; 

int idx; 

int modified; 

int junk; 
2(k, double junk2; 

5% int qidx, sidx, splitidx, splitlnThree; 

m double best2; 

[p double best3; 

25g double attachPen; 

j; int bailedout = 0; 
fU int tfirags = 0; 

3Qe matrix = (int *) malloc( numSlns * numSlns * sizeof(int) ); 

Q splits = (Split **) calloc(numSlns, sizeof(Split *) ); 

□ radius = 2000.0; ; 

u q_bailout = radius * radius; /* just force it very high */ 

35 #ifO :] 
qjninatoms = 3; - 
q_termFlag = 1; 

#endif 

q_matrixMode =1; j, 
40 TOP_STER_REGION_MODE(2); 

for ( i = 0; i < numSlns; i++ ) 

{ 

fprintf(stderr," initializing %d for matrix total Frags: %d\n", i+1, tfrags); 
45 ct = DB_IMPORT_SLN(sIns[i]); 

if ( !ct ) 

{ ■;. 

UTL_ERliOR_CLEAR0; 
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splits[i] = (Split *) 0; 


continue; ; 

} 

cord = (double *) 0; 

5 DB_CT_GET_CT_ATTR(ct, CtCt3DCoordSet, &cord, &natoms ); 

if(!cord) ; Mi 

{ ■ -! 

DB_CTJ?ELETE_CT(ct); 
splits[i] = (Split *) 0; 
10 continue; 

. } 

DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, AnParts ); 
if ( nParts > 1 ) 

{ 

15 largest = getLargestFrag(ct); 

DBCTDELETECT(ct) ; 
ct = largest; 

} ; 

DB_CT_NORM_AROM(ct); 
20^ DB_CT_STANDARD(ct, &modified); 

^ DB_CT_UTL_FIND_RINGS(ct); 

2? utl_error_ci!earo; 

SI S = FindBreakPqints(ct, q_minatoms, q_termFlag, TRUE ); 

|S if ( q_termFlag ) f 

251= j = q_minatoms - 1; 

else [f- 
JE{ j = qjninatoms; 

L while((!S || S->s2cnt == 0 )&&j >= 3) 

30g { ? 

Q if(S) 

fy freeSplit(S); 

P S = FindBreakPoints(ct, j, 0, TRUE ); 

y, qjninatoms = j; 

3$ j--; \ 

} ) 

if (S && S->s2c'nt == 0) 

< I 
freeSplit(S); 

40 S = (Spli^ *) 0; 

} : I 

splits[i] = S; \ 
if ( !S ) ft 
continue; |> 

45 tfrags += S->numFrags; 

S->ct = ct; V 

SearchForFeatures(S); 

BuildFrags(S); 

i 
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BuildTopomers(ct| S, (Split *) 0); 

for (j = 0, fptr = S->firags; j < S->numFrags; + , fptr+ + ) 

{ * 

fptr->qtffO] = fptr->topField; 

5 } 

freeFragCts(S); | 

} I 

fprintf(stderr, "Finished initializing for matrix\n"); 

i 

10 for ( i = 0; i < numSlnsf i+ + ) 

{ i 
QS = splits[i]; 

qs = QS; 

for ( j = 0; j < numSlns; j + + ) 

« { J 

idx = i*numSlns + j; 

if(i==l) 

{ ' ' 

matrix[idx] = 0; 

2(L continue; 

S * t 

r: S = splits[j]; 

SJ if (!QS || !S) 

2^ if;( !QS && !S ) 

| matrix[idx] = 0; /* both don't have coordinates */ 

' matrixfidx] = 5000; /* one of them doesn't */ 
continue; 

3( te } 
p if ( q_featureFactor > 0.0 ) 

m comfadiff = CompareAHFeatures(QS,S,radius); 

n comfa_diff = CompareTwoCompounds(QS, S, radius, &qidx, &sidx, &splitidx, 

U &splitInThree, &junk, I; 
35 &best2, &best3, &junk2, &attachPen, bailedout ); 

matrix[idx] = (int) comfadiff; 

freeStrMap(QS); J 

fprintf(stderr,"pass %d complete\n", i+1 ); 
40 } i 

qjnatrixMode = 0; J 
return matrix; 

} ? 


struct CtConnectionTable *getLargestFrag(struct CtConnectionTable *ct ) 

{ 

struct CtConnectionTable .**cts; 
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struct CtConnectionTable *largest; 

int max Atoms; 

int currAtoms; ' 

int whichPiece; 

int nParts; f 

int idx; 

int *atoms; 

int natoms; ; 
int i; , | 

int *ordering; ? 

DB_CT_UTL_SPLIT_CT(ct, &nParts, &cts, &whichPiece,(int **) 0); 
largest = cts[0]; 

DB_CT_GET_CT_ATTRjlargest, CtCtAtomCount, &maxAtoms ); 
idx - 1; * 
for ( i = 1; i < nParts; i*+ + ) 

{ 

DB_CT_GET_CT_ATTR(cts [i] , CtCtAtomCount, &currAtoms ); 
if ( currAtoms > maxAtoms ) 

{ i 

largest = cts[i]; 
maxAtoms = currAtoms; 
idx = 

} ( 

} •; 

atoms = (int *) calloc(ct- > atomCount, sizeof(int) ); 
for ( natoms = 0, i = 1; i < = ct- > atomCount; i + + ) 

{ i 

if ( whichPiece[i] i- = idx ) 

{ I 

atoms[natoms] = i; 
natoms+ + ; 

} 

} 

largest = DB_CT_UTL_COPY_CT(ct, natoms, atoms, .^ordering, CtCopyKeepAHAttrs 
for ( i = 0; i < nParts; i+ + ) 

DB_CT_DELETE_CT(cts[i]); 
free((char *) atoms ); 

return largest; : 
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